Apache Arrow 22.0.0 Release


Published 24 Oct 2025
By The Apache Arrow PMC (pmc)

The Apache Arrow team is pleased to announce the 22.0.0 release. This release covers over 3 months of development work and includes 213 resolved issues on 255 distinct commits from 60 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Community

Since the 21.0.0 release, Kyle Barron has been invited to be committer.

Matthijs Brobbel, Adam Reeve and Rossi Sun have been joined the Project Management Committee (PMC).

Thanks for your contributions and participation in the project!

The first Apache Arrow Summit was held on October 2nd 2025 in Paris, France as part of PyData Paris. Program details and agenda can be found here: https://www.meetup.com/pydata-paris/events/310646396/

There were around 35 attendees, of which ~20 were existing core developers or PMC members. The Summit was overwhelmingly described as a success, with a friendly atmosphere between all participants. Unfortunately, no Audio / Video recording system was available for this event.

Arrow Flight RPC Notes

Support for dictionary replacement and dictionary encoding has been added to the DoGet and DoExchange methods. (GH-45056, GH-45055 and GH-26727).

As part of supporting dictionary replacement we have also exposed the ipc::ReadStats on the FlightStreamReader in order to facilitate debugging. (GH-47422)

C++ Notes

Compute

Timezone aware kernels can now handle timezone offset strings. (GH-30036)

Better decimal support has been added introducing a structure MatchConstraint for applying extra (and optional) matching constraint for kernel signature matching. (GH-47287, GH-41336)

The scatter function has been moved to Arrow core from Arrow Compute. (GH-47375)

Filesystems

The Request ID has been added when the AWS client raises an error. (GH-47349)

Format

Several improvements around Half Float (Float16) support. (GH-46860, GH-46739)

Parquet

Better Fuzzing support for Parquet and several related fixes. (GH-47803, GH-47740, GH-47655, GH-47597, GH-47184)

Rework around the RLE decoder in order to extract a RLE parser to drive further optimisations. (GH-47112)

Dynamic dispatch support has been added to Byte Stream Split. (GH-46962)

Now some statistics, i.e. null count. will not be discarded when the sort order of the column is unknown. (GH-47449)

is_min_value_exact and is_max_value_exact now are exposed in Parquet Statistics if present when reading. (GH-46905)

We now reserve values correctly when reading BYTE_ARRAY and FLBA. (GH-47012)

Encryption

String based Parquet encrption methods have been deprecated. (GH-47338)

Memory usage required by decryption buffers when reading encrypted Parquet has been reduced. (GH-46971)

Type support

Improvements on the Parquet Variant type support. (GH-47241, GH-47838)

Better support for Decimal32 and Decimal64. (GH-44345)

Gandiva

Support for LLVM 21.1.0 has been added. (GH-47469)

Miscellaneous C++ changes

Add support for further Arrow Statistics. (GH-47102, GH-47101)

Support for shared memory comparison in arrow::RecordBatch has been added. (GH-47149)

arrow::Table::Equals now allows an optional arrow::EqualOptions argument. (GH-46937)

Skyhook integration has been removed from the main repository and has been moved to its own repository. (GH-47225)

Linux Packaging Notes

Support for Debian forky has been added. (GH-47312)

MATLAB Notes

  • NumNulls property was added to arrow.array.Array and arrow.arrray.ChunkedArray. (GH-47263, GH-38422)

Python Notes

Compatibility notes:

  • Support for Python 3.9 has been dropped (GH-47443) and support for Python 3.14, regular and free-threaded has been added, (GH-47438).
  • Cython 3.1 is now required build-time dependency (GH-47370).
  • project.optional-dependencies has been replaced with dependency-groups (GH-47137).

New features:

  • CSV writer option quoting_header is now exposed (GH-47575).

Other improvements:

  • Support for pandas DataFrame.attrs during conversion between a dataframe and a Parquet file has been added (GH-45382).
  • A utility function to create Arrow table instead of pandas dataframe has been added (GH-47172).
  • IPC and Flight options now have a nice repr/str methods (GH-47358).
  • Access to Request ID in AWS client error is now available from Python (GH-47349).
  • Public Type Enums are added (GH-47123).
  • Python Development documentation section has been restructured in order to make it easier for contributors to build and develop PyArrow (GH-20125.

Relevant bug fixes:

  • Schema is now hashable when metadata is set (GH-47602).
  • MapScalar.as_py(maps_as_pydicts="strict") option now works for nested maps (GH-47380).
  • FileFragment.open() no longer segfaults on file-like objects (GH-47301).
  • pa.compute.fill_null regression on Windows due to a compiler bug has been fixed (GH-47234).
  • Integer dictionary bitwidth preservation no longer breaks multi-file read behaviour as DatasetFactory.inspect method now accepts promote_options and fragments parameters (GH-46629).
  • FileSystem.from_uri is reverted to be a staticmethod again (GH-47179).

Java, JavaScript, Go, .NET, Swift and Rust Notes

The Java, JavaScript, Go, .NET, Swift and Rust projects have moved to separate repositories outside the main Arrow monorepo.