Apache Arrow 0.14.0 Release

Published 02 Jul 2019
By The Apache Arrow PMC (pmc)

The Apache Arrow team is pleased to announce the 0.14.0 release. This covers 3 months of development work and includes 602 resolved issues from 75 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available.

This post will give some brief highlights in the project since the 0.13.0 release from April.

New committers

Since the 0.13.0 release, the following have been added:

Thank you for all your contributions!

Upcoming 1.0.0 Format Stability Release

We are planning for our next major release to move from 0.14.0 to 1.0.0. The major version number will indicate stability of the Arrow columnar format and binary protocol. While the format has already been stable since December 2017, we believe it is a good idea to make this stability official and to indicate that it is safe to persist serialized Arrow data in applications. This means that applications will be able to safely upgrade to new Arrow versions without having to worry about backwards incompatibilities. We will write in a future blog post about the stability guarantees we intend to provide to help application developers plan accordingly.

Packaging

We added support for the following platforms:

We dropped support for Ubuntu 14.04.

Development Infrastructure and Tooling

As the project has grown larger and more diverse, we are increasingly outgrowing what we can test in public continuous integration services like Travis CI and Appveyor. In addition, we share these resources with the entire Apache Software Foundation, and given the high volume of pull requests into Apache Arrow, maintainers are frequently waiting many hours for the green light to merge patches.

The complexity of our testing is driven by the number of different components and programming languages as well as increasingly long compilation and test execution times as individual libraries grow larger. The 50 minute time limit of public CI services is simply too limited to comprehensively test the project. Additionally, the CI host machines are constrained in their features and memory limits, preventing us from testing features that are only relevant on large amounts of data (10GB or more) or functionality that requires a CUDA-enabled GPU.

Organizations that contribute to Apache Arrow are working on physical build infrastructure and tools to improve build times and build scalability. One such new tool is ursabot, a GitHub-enabled bot that can be used to trigger builds either on physical build or in the cloud. It can also be used to trigger benchmark timing comparisons. If you are contributing to the project, you may see Ursabot being employed to trigger tests in pull requests.

To help assist with migrating away from Travis CI, we are also working to make as many of our builds reproducible with Docker and not reliant on Travis CI-specific configuration details. This will also help contributors reproduce build failures locally without having to wait for Travis CI.

Columnar Format Notes

Arrow Flight notes

Flight now supports many of the features of a complete RPC framework.

Windows is now a supported platform for Flight in C++ and Python (ARROW-3294), and Python wheels are shipped for all languages (ARROW-3150, ARROW-5656). C++, Python, and Java have been brought to parity, now that actions can return streaming results in Java (ARROW-5254).

C++ notes

188 resolved issues related to the C++ implementation, so we summarize some of the work here.

General platform improvements

Line-delimited JSON reader

A multithreaded line-delimited JSON reader (powered internally by RapidJSON) is now available for use (also in Python and R via bindings) . This will likely be expanded to support more kinds of JSON storage in the future.

New computational kernels

A number of new computational kernels have been developed

C# Notes

The native C# implementation has continued to mature since 0.13. This release includes a number of performance, memory use, and usability improvements.

Go notes

Go’s support for the Arrow columnar format continues to expand. Go now supports reading and writing the Arrow columnar binary protocol, and it has also been added to the cross language integration tests. There are now four languages (C++, Go, Java, and JavaScript) included in our integration tests to verify cross-language interoperability.

Java notes

JavaScript Notes

A new incremental array builder API is available.

MATLAB Notes

Version 0.14.0 features improved Feather file support in the MEX bindings.

Python notes

Parquet improvements

Ruby and C GLib notes

The GLib and Ruby bindings have been tracking features in the C++ project. This release includes bindings for Gandiva, JSON reader, and other C++ features.

Rust notes

There is ongoing work in Rust happening on Parquet file support, computational kernels, and the DataFusion query engine. See the full changelog for details.

R notes

We have been working on build and packaging for R so that community members can hopefully release the project to CRAN in the near future. Feature development for R has continued to follow the upstream C++ project.

Community Discussions Ongoing

There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there: