Apache Arrow 0.17.0 Release

Published 21 Apr 2020
By The Apache Arrow PMC (pmc)

The Apache Arrow team is pleased to announce the 0.17.0 release. This covers over 2 months of development work and includes 569 resolved issues from 79 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Community

Since the 0.16.0 release, two committers have joined the Project Management Committee (PMC):

Thank you for all your contributions!

Columnar Format Notes

A C-level Data Interface was designed to ease data sharing inside a single process. It allows different runtimes or libraries to share Arrow data using a well-known binary layout and metadata representation, without any copies. Third party libraries can use the C interface to import and export the Arrow columnar format in-process without requiring on any new code dependencies.

The C++ library now includes an implementation of the C Data Interface, and Python and R have bindings to that implementation.

Arrow Flight RPC notes

C++ notes

Feather V2

The “Feather V2” format based on the Arrow IPC file format was developed. Feather V2 features full support for all Arrow data types, and resolves the 2GB per-column limitation for large amounts of string data that the original Feather implementation had. Feather V2 also introduces experimental IPC message compression using LZ4 frame format or ZSTD. This will be formalized later in the Arrow format.

C++ Datasets

C++ Parquet notes

C++ build notes

Other C++ notes

Java notes

Python notes

Datasets

Packaging

Other improvements

R notes

Highlights include support for the Feather V2 format and the C Data Interface, both described above. Along with low-level bindings for the C interface, this release adds tooling to work with Arrow data in Python using reticulate. See vignette("python", package = "arrow") for a guide to getting started.

Installation on Linux now builds C++ the library from source by default. For a faster, richer build, set the environment variable NOT_CRAN=true. See vignette("install", package = "arrow") for details and more options.

For more on what’s in the 0.17 R package, see the R changelog.

Ruby and C GLib notes

Ruby

C GLib

Rust notes

Rust Parquet notes

Rust DataFusion notes

Project Operations

We’ve continued our migration of general automation toward GitHub Actions. The majority of our commit-by-commit continuous integration (CI) is now running on GitHub Actions. We are working on different solutions for using dedicated hardware as part of our CI. The Buildkite self-hosted CI/CD platform is now supported on Apache repositories and GitHub Actions also supports self-hosted workers.