Apache Arrow 0.11.0 Release
09 Oct 2018
By Wes McKinney (wesm)
The Apache Arrow team is pleased to announce the 0.11.0 release. It is the product of 2 months of development and includes 287 resolved issues.
We discuss some highlights from the release and other project news in this post.
Arrow Flight RPC and Messaging Framework
We are developing a new Arrow-native RPC framework, Arrow Flight, based on gRPC for high performance Arrow-based messaging. Through low-level extensions to gRPC’s internal memory management, we are able to avoid expensive parsing when receiving datasets over the wire, unlocking unprecedented levels of performance in moving datasets from one machine to another. We will be writing more about Flight on the Arrow blog in the future.
Prototype implementations are available in Java and C++, and we will be focused in the coming months on hardening the Flight RPC framework for enterprise-grade production use cases.
Parquet and Arrow C++ communities joining forces
After discussion over the last year, the Apache Arrow and Apache Parquet C++ communities decide to merge the Parquet C++ codebase into the Arrow C++ codebase and work together in a “monorepo” structure. This should result in better developer productivity in core Parquet work as well as in Arrow integration.
Before this codebase merge, we had a circular dependency between the Arrow and Parquet codebases, since the Parquet C++ library is used in the Arrow Python library.
Gandiva LLVM Expression Compiler donation
Dremio Corporation has donated the Gandiva LLVM expression compiler to Apache Arrow. We will be working on cross-platform builds, packaging, and language bindings (e.g. in Python) for Gandiva in the upcoming 0.12 release and beyond. We will write more about Gandiva in the future.
Parquet C GLib Bindings Donation
PMC member Kouhei Sutou has donated GLib bindings for the Parquet C++ libraries, which are designed to work together with the existing Arrow GLib bindings.
C++ CSV Reader Project
We have begun developing a general purpose multithreaded CSV file parser in C++. The purpose of this library is to parse and convert comma-separated text files into Arrow columnar record batches as efficiently as possible. The prototype version features Python bindings, and any language that can use the C++ libraries (including C, R, and Ruby).
New MATLAB bindings
The MathWorks has contributed an initial MEX file binding to the Arrow C++ libraries. Initially, it is possible to read Arrow-based Feather files in MATLAB. We are looking forward to seeing more developments for MATLAB users.
R Library in Development
The community has begun implementing R language bindings and interoperability with the Arrow C++ libraries. This will include support for zero-copy shared memory IPC and other tools needed to improve R integration with Apache Spark and more.
Support for CUDA-based GPUs in Python
In the coming months, we will continue to make progress on many fronts, with Gandiva packaging, expanded language support (especially in R), and improved data access (e.g. CSV, Parquet files) in focus.