Apache Arrow Rust 6.0.0 Release


Published 09 Nov 2021
By The Apache Arrow PMC (pmc)

We recently released the 6.0.0 Rust version of Apache Arrow, which coincides with the Arrow 6.0.0 release. This post highlights some of the improvements in the Rust implementation. The full changelog can be found here.

The Rust Arrow implementation would not be possible without the wonderful work and support of our community, and the 6.0.0 release is no exception. It includes 99 commits from 35 individual contributors, many of them with their first contribution. Thank you all very much.

Arrow

Highlighted features and changes between release 5.0.0 and this release are:

  1. New MapArray support
  2. Add optimized filter kernel for regular expression matching
  3. Implement sort() for BinaryArray
  4. Replace ArrayData::new() with ArrayData::try_new() and unsafe ArrayData::new_unchecked
  5. Sorting should require less memory and be faster

Of course, this release also contains bug fixes, performance improvements, and improved documentation examples. For the full list of changes, please consult the changelog.

More Frequent Releases

Arrow releases major versions every three months. The Rust implementation follows this major release cycle, and additionally releases minor version updates approximately every other week to speed the flow of new features and fixes.

You can always find the latest releases on crates.io: arrow, parquet, arrow-flight, and parquet-derive.

DataFusion & Ballista

DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on top of Arrow. Ballista is a distributed compute platform. These projects are now in their own repository, and are no longer released in lock-step with Arrow.

Highlighted Functionality

The memory required to do sorting has been improve by the pull request resolving issue 553. A demonstration for how to sort follows.

extern crate arrow;

use arrow::array::{
    Int32Array,
    ArrayRef,
};
use std::sync::Arc;
use arrow::compute::sort;

fn main() {
    
    let array: ArrayRef = Arc::new(Int32Array::from(vec![5, 4, 23, 1, 20, 2]));
    println!("{:?}", array);
    let sorted_array = sort(&array, None).unwrap();
    println!("{:?}", sorted_array);
}

For further examples, see the source code.

Roadmap for 7.0.0 and Beyond

Here are some of the initiatives that contributors are currently working on for future releases:

  • Validate arguments to ArrayData::new and null bit buffer and buffers
  • add ilike comparitor
  • refactor regexp_is_match_utf8_scalar
  • add support for float 16
  • Updated UnionArray support to follow the latest arrow spec

Contributors to 6.0.0:

Again, thank you to all the contributors for this release. Here is the raw git listing:

    23  Andrew Lamb
     8  Ben Chambers
     8  Navin
     7  Jiayu Liu
     5  Wakahisa
     4  Ruihang Xia
     3  Daniël Heres
     3  Matthew Turner
     3  Sumit
     2  Boaz
     2  Chojan Shang
     2  Ilya Biryukov
     2  Krisztián Szűcs
     2  Markus Westerlind
     2  Roee Shlomo
     2  Sergii Mikhtoniuk
     2  Wang Fenjin
     2  baishen
     1  Carol (Nichols || Goulding)
     1  Christian Williams
     1  Felix Yan
     1  Jorge Leitao
     1  Kornelijus Survila
     1  Matthew Zeitlin
     1  Mike Seddon
     1  Mykhailo Osypov
     1  Neal Richardson
     1  Pete Koomen
     1  QP Hou
     1  Richard
     1  Xavier Lange
     1  Yuan Zhou
     1  aiglematth
     1  mathiaspeters-sig
     1  msalib

How to Get Involved

If you are interested in contributing to the Rust implementation of Apache Arrow, we would love to have you! You can help by trying out Arrow on some of your own data and projects and filing bug reports and helping to improve the documentation, or contribute to the documentation, tests or code. A list of open issues suitable for beginners is here and the full list is here