Apache Arrow 20.0.0 Release


Published 27 Apr 2025
By The Apache Arrow PMC (pmc)

The Apache Arrow team is pleased to announce the 20.0.0 release. This release covers over 2 months of development work and includes 259 resolved issues on 327 distinct commits from 63 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Community

Since the 19.0.0 release, Ed Seidl, Jean-Baptiste Onofré and Matthijs Brobbel have been invited to become committers. Bryce Mecum, Ian Cook, Jacob Wujciak-Jens and Rok Mihevc have been invited to join the Project Management Committee (PMC).

Thanks for your contributions and participation in the project!

C++ Notes

Compute

We’ve added several new compute functions: inverse_permutation/scatter (GH-44393), pivot_wider/hash_pivot_wider (GH-45269), rank_normal (GH-45572), skew/kurtosis (GH-45676), and winsorize (GH-45755).

Acero

We’ve significantly improved the Hash Join in terms of overflow-safety (GH-44513, GH-45334, GH-45506), memory consumption (peak memory usage reduced by half: GH-45551), and performance (up to several dozen times faster: GH-45611, GH-45917).

Flight RPC

  • The experimental Flight over UCX feature has been removed. (#43296)

C# Notes

Linux Packaging Notes

https://apache.jfrog.io/ is still available but https://packages.apache.org/ is preferred because the latter uses the apache.org domain.

Python Notes

Compatibility notes:

  • Minimum supported Cython has been raised to 3 and higher GH-45237 .
  • A subset of deprecated APIs have been removed GH-45680: PARQUET_2_0 GH-45848, use_legacy_dataset GH-44790, serialize/deserialize PyArrow C++ code GH-43587 .

New features:

  • Large variable width types are supported in NumPy conversion GH-35289.
  • Biased/unbiased option are available in skew and kurtosis compute functions GH-45733.
  • Support for SAS token in the AzureFileSystem has been added GH-45705.
  • Interchange of decimal32, decimal64 and decimal256 data type objects between Pandas and PyArrow is now supported GH-45582, GH-45570.
  • pyarrow.ArrayStatisticsand pyarrow.Array.statistics() are added GH-45457.
  • Bindings for JSON streaming reader are added GH-14932.
  • Bindings for MemoryPool::total_bytes_allocated and MemoryPool::num_allocations are added. Also allocator-specific statistics can now be printed to stderr GH-45358.
  • A new maps_as_pydictsparameter is introduced to to_pylist, to_pydict and as_py methods enabling deserialization into Python dictionary instead of list of tuples GH-39010.

Other improvements:

  • Source (sdist) and binary distribution (wheels) are now uploaded to GitHub Releases GH-45920.
  • Cython code has been cleaned up as we now require at least Cython 3.0 GH-45433.
  • Building of free-threaded wheels on Windows is enabled GH-44421 . *Wheels for Alpine Linux are now provided GH-18036 .

Relevant bug fixes:

  • Pandas conversion roundtrip with bytes column names error is fixed GH-44188.
  • Exceptions are raised instead of showing segfaults when users try to instantiate internal Parquet metadata classes GH-36628.

R Notes

  • Binary Arrays now inherit from blob::blob in addition to arrow_binary when converted to R objects. This change is the first step in eventually deprecating the arrow_binary class in favor of the blob class in the blob package (See GH-45709).

Ruby and C GLib Notes

Improvements

  • garrow_array_validate() / Arrow::Array#validate: Added.
  • garrow_array_validate_full() / Arrow::Array#validate_full: Added.
  • garrow_record_batch_validate() / Arrow::RecordBatch#validate: Added.
  • garrow_record_batch_validate_full() / Arrow::RecordBatch#validate_full: Added.
  • garrow_table_validate() / Arrow::Table#validate: Added.
  • garrow_table_validate_full() / Arrow::Table#validate_full: Added.
  • GArrowArrayStatistics() / Arrow::ArrayStatistics: Added.
  • Changed to require Meson 0.61.2 or later.
  • GArrowBinaryViewArray() / Arrow::BinaryViewArray: Added.
  • GArrowStringViewArray() / Arrow::StringViewArray: Added.
  • Added support for rubygems-requirements-system

Incompatible changes

  • gparquet_arrow_file_writer_new_row_group() / Parquet:ArrowFileWriter#new_row_group: Removed chunk_size argument.
  • garrow_record_batch_new() / Arrow::RecordBatch#initialize: Stopped validating automatically. If you want to validate a created record batch, call garrow_record_batch_validate() / Arrow::RecordBatch#validate explicitly.
  • garrow_table_new() / Arrow::Table#initialize: Stopped validating automatically. If you want to validate a created table, call garrow_table_validate() / Arrow::Table#validate explicitly.

Java, Go, and Rust Notes

The Java, Go, and Rust Go projects have moved to separate repositories outside the main Arrow monorepo.