Apache Arrow 20.0.0 Release
Published
27 Apr 2025
By
The Apache Arrow PMC (pmc)
The Apache Arrow team is pleased to announce the 20.0.0 release. This release covers over 2 months of development work and includes 259 resolved issues on 327 distinct commits from 63 distinct contributors. See the Install Page to learn how to get the libraries for your platform.
The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.
Community
Since the 19.0.0 release, Ed Seidl, Jean-Baptiste Onofré and Matthijs Brobbel have been invited to become committers. Bryce Mecum, Ian Cook, Jacob Wujciak-Jens and Rok Mihevc have been invited to join the Project Management Committee (PMC).
Thanks for your contributions and participation in the project!
C++ Notes
Compute
We’ve added several new compute functions: inverse_permutation/scatter (GH-44393), pivot_wider/hash_pivot_wider (GH-45269), rank_normal (GH-45572), skew/kurtosis (GH-45676), and winsorize (GH-45755).
Acero
We’ve significantly improved the Hash Join in terms of overflow-safety (GH-44513, GH-45334, GH-45506), memory consumption (peak memory usage reduced by half: GH-45551), and performance (up to several dozen times faster: GH-45611, GH-45917).
Flight RPC
- The experimental Flight over UCX feature has been removed. (#43296)
C# Notes
- Added support for the
Ordered
andAppMetaData
fields to FlightInfo (#45753) - FlightClient can now be integrated with Grpc.Net.ClientFactory (#45451)
Linux Packaging Notes
https://apache.jfrog.io/ is still available but https://packages.apache.org/ is preferred because the latter uses the apache.org domain.
Python Notes
Compatibility notes:
- Minimum supported Cython has been raised to 3 and higher GH-45237 .
- A subset of deprecated APIs have been removed
GH-45680:
PARQUET_2_0
GH-45848,use_legacy_dataset
GH-44790, serialize/deserialize PyArrow C++ code GH-43587 .
New features:
- Large variable width types are supported in NumPy conversion GH-35289.
- Biased/unbiased option are available in skew and kurtosis compute functions GH-45733.
- Support for SAS token in the
AzureFileSystem
has been added GH-45705. - Interchange of
decimal32
,decimal64
anddecimal256
data type objects between Pandas and PyArrow is now supported GH-45582, GH-45570. pyarrow.ArrayStatistics
andpyarrow.Array.statistics()
are added GH-45457.- Bindings for JSON streaming reader are added GH-14932.
- Bindings for
MemoryPool::total_bytes_allocated
andMemoryPool::num_allocations
are added. Also allocator-specific statistics can now be printed to stderr GH-45358. - A new
maps_as_pydicts
parameter is introduced toto_pylist
,to_pydict
andas_py
methods enabling deserialization into Python dictionary instead of list of tuples GH-39010.
Other improvements:
- Source (sdist) and binary distribution (wheels) are now uploaded to GitHub Releases GH-45920.
- Cython code has been cleaned up as we now require at least Cython 3.0 GH-45433.
- Building of free-threaded wheels on Windows is enabled GH-44421 . *Wheels for Alpine Linux are now provided GH-18036 .
Relevant bug fixes:
- Pandas conversion roundtrip with bytes column names error is fixed GH-44188.
- Exceptions are raised instead of showing segfaults when users try to instantiate internal Parquet metadata classes GH-36628.
R Notes
- Binary Arrays now inherit from
blob::blob
in addition toarrow_binary
when converted to R objects. This change is the first step in eventually deprecating thearrow_binary
class in favor of theblob
class in theblob
package (See GH-45709).
Ruby and C GLib Notes
Improvements
garrow_array_validate()
/Arrow::Array#validate
: Added.garrow_array_validate_full()
/Arrow::Array#validate_full
: Added.garrow_record_batch_validate()
/Arrow::RecordBatch#validate
: Added.garrow_record_batch_validate_full()
/Arrow::RecordBatch#validate_full
: Added.garrow_table_validate()
/Arrow::Table#validate
: Added.garrow_table_validate_full()
/Arrow::Table#validate_full
: Added.GArrowArrayStatistics()
/Arrow::ArrayStatistics
: Added.- Changed to require Meson 0.61.2 or later.
GArrowBinaryViewArray()
/Arrow::BinaryViewArray
: Added.GArrowStringViewArray()
/Arrow::StringViewArray
: Added.- Added support for rubygems-requirements-system
Incompatible changes
gparquet_arrow_file_writer_new_row_group()
/Parquet:ArrowFileWriter#new_row_group
: Removedchunk_size
argument.garrow_record_batch_new()
/Arrow::RecordBatch#initialize
: Stopped validating automatically. If you want to validate a created record batch, callgarrow_record_batch_validate()
/Arrow::RecordBatch#validate
explicitly.garrow_table_new()
/Arrow::Table#initialize
: Stopped validating automatically. If you want to validate a created table, callgarrow_table_validate()
/Arrow::Table#validate
explicitly.
Java, Go, and Rust Notes
The Java, Go, and Rust Go projects have moved to separate repositories outside the main Arrow monorepo.
- For notes on the latest release of the Java implementation, see the latest Arrow Java changelog.
- For notes on the latest release of the Rust implementation see the latest Arrow Rust changelog.
- For notes on the latest release of the Go implementation, see the latest Arrow Go changelog.