Apache Arrow 0.9.0 Release

Published 22 Mar 2018
By Wes McKinney (wesm)

The Apache Arrow team is pleased to announce the 0.9.0 release. It is the product of over 3 months of development and includes 260 resolved JIRAs.

While we made some of backwards-incompatible columnar binary format changes in last December’s 0.8.0 release, the 0.9.0 release is backwards-compatible with 0.8.0. We will be working toward a 1.0.0 release this year, which will mark longer-term binary stability for the Arrow columnar format and metadata.

See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available.

We discuss some highlights from the release and other project news in this post. This release has been overall focused more on bug fixes, compatibility, and stability compared with previous releases which have pushed more on new and expanded features.

New Arrow committers and PMC members

Since the last release, we have added 2 new Arrow committers: Brian Hulette and Robert Nishihara. Additionally, Phillip Cloud and Philipp Moritz have been promoted from committer to PMC member. Congratulations and thank you for your contributions!

Plasma Object Store Improvements

The Plasma Object Store now supports managing interprocess shared memory on CUDA-enabled GPUs. We are excited to see more GPU-related functionality develop in Apache Arrow, as this has become a key computing environment for scalable machine learning.

Python Improvements

Antoine Pitrou has joined the Python development efforts and helped significantly this release with interoperability with built-in CPython data structures and NumPy structured data types.

  • New experimental support for reading Apache ORC files
  • pyarrow.array now accepts lists of tuples or Python dicts for creating Arrow struct type arrays.
  • NumPy structured dtypes (which are row/record-oriented) can be directly converted to Arrow struct (column-oriented) arrays
  • Python 3.6 pathlib objects for file paths are now accepted in many file APIs, including for Parquet files
  • Arrow integer arrays with nulls can now be converted to NumPy object arrays with None values
  • New pyarrow.foreign_buffer API for interacting with memory blocks located at particular memory addresses

Java Improvements

Java now fully supports the FixedSizeBinary data type.

JavaScript Improvements

The JavaScript library has been significantly refactored and expanded. We are making separate Apache releases (most recently JS-0.3.1) for JavaScript, which are being published to NPM.

Upcoming Roadmap

In the coming months, we will be working to move Apache Arrow closer to a 1.0.0 release. We will also be discussing plans to develop native Arrow-based computational libraries within the project.