Apache Arrow 23.0.0 Release


Published 18 Jan 2026
By The Apache Arrow PMC (pmc)

The Apache Arrow team is pleased to announce the 23.0.0 release. This release covers over 3 months of development work and includes 336 resolved issues on 417 distinct commits from 71 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Community

As per our newly started tradition of rotating the PMC chair once a year Antoine Pitrou was elected as the new PMC chair and VP, succeeding Neal Richardson.

Thanks for your contributions and participation in the project!

Arrow Flight RPC Notes

An ODBC driver for Apache Arrow Flight SQL has been completed. Currently it is not packaged for release, but can be built from source.

C++ Notes

The C++ standard has been updated to C++ 20 GH-45885 and the minimum GCC to 8.

Some improvements to leverage C++ 20 GH-48592,

Compute

  • Graceful error handling for decimal binary arithmetic and comparison instead of firing confusing assertions. GH-35957
  • Fixed an issue where the MinMax kernel was emitting -inf/inf for all-NaN input. GH-46063
  • Avoid ZeroCopyCastExec when casting between Binary offset types to avoid high overheads. GH-43660
  • Enhanced type checking for hash join residual filter in Acero. GH-48268

Format

  • Clarified that empty compressed buffers can omit the length header. GH-47918

Parquet

  • A new setting to limit the number of rows written per page has been added. GH-47030
  • A arrow::Result version of parquet::arrow::FileReader::Make() has been added. GH-44810
  • Support for reading INT-encoded Decimal statistics as Arrow scalars. GH-47955

Several bug fixes including:

  • Fixed invalid Parquet files written when dictionary encoded pages are large. GH-47973
  • Fixed pre-1970 INT96 timestamps roundtrip. GH-48246
  • Fixed potential crash when reading invalid Parquet data. GH-48308
  • Added compatibility with non-compliant RLE streams. GH-47981
  • Fixed Util & Level Conversion logic on big-endian systems. GH-48218

Encryption

  • Simplified nested field encryption configuration. GH-41246
  • Improved column encryption API. GH-48337
  • Better fuzzing support for encrypted files. GH-48335

Miscellaneous C++ changes

Linux Packaging Notes

Fixed a bug that the parquet-devel RPM package depends on parquet-glib-devel.

See also: GH-48044

CentOS 7 support has been dropped.

See also: GH-40735

MATLAB Notes

Added support for building against MATLAB R2025b GH-48154.

Python Notes

Compatibility notes

  • Deprecated Array.format is removed GH-48102.
  • Experimental tag has been removed for Arrow PyCapsule Interface GH-47975.
  • PyWeakref_GetRef has replaced the use of PyWeakref_GET_OBJECT to support Python 3.15 GH-47823.

New features

  • Bindings for scatter and inverse_permutationare added GH-48167.
  • max_rows_per_page argument is now exposed in parquet.WriterProperties GH-48096.
  • External key material and rotation is enabled for individual Parquet files GH-31869.

Other improvements

  • Nested field encryption configuration has been simplified GH-41246.
  • Reading INT-encoded Decimal statistics with StatisticsAsScalars is now supported GH-47955.
  • Unsigned dictionary indices are now supported in pandas conversion GH-47022.
  • Added code examples for compute functions min, max and min_max GH-48668.
  • Add temporal unit checking in NumPyDtypeUnifier GH-48625
  • Error message is improved when mixing numpy.datetime64 values with different units (e.g., datetime64[s] and datetime64[ms]) in a single array GH-48463.
  • The source argument is now checked in pyarrow.parquet.read_table GH-47728.

Relevant bug fixes

  • ipc.Message __repr__ has been corrected to use f-string GH-48608.
  • Failures when reading parquet files written with non-compliant RLE encoders have been fixed in C++ with adding compatibility GH-47981.
  • Memory usage is now reduced when using to_pandas() with many extension arrays columns GH-47861.
  • Missing required argument error in FSSpecHandler delete_root_dir_contents has been fixed GH-47559.
  • Invalid RecordBatch.from_struct_array batch for sliced arrays with offset zero has been fixed in the C++ GH-44318.

R Notes

Compatibility notes

  • GCS have been turned off by default GH-48342.
  • OpenSSL 1.x builds have been removed GH-45449

Relevant bug fixes

  • Fixed a segfault that could be raised when concatenatig tables GH-47000.

Several Continuous integration fixes and minor bugs have also been added to the release for a full list check the release notes.

Ruby and C GLib Notes

All missing compute function options have been added. So we can use all compute functions from Ruby and C GLib. This is done by Sten Larsson.

Fixed size list array support has been added.

See also: GH-48362

Changing thread pool configuration support in Acero has been added. This is done by Sten Larsson.

Duration support has been added.

CSV writer support has been added.

See also: GH-48680

Ruby

Experimental Pure Ruby Apache Arrow reader implementation has been added as red-arrow-format gem.

See also: GH-48132

We'll add experimental writer implementation in the next release.

Arrow::Column#to_arrow{,_array,_chunked_array} have been added. They are for convenient.

See also: GH-48292

Auto Apache Arrow type detection in Arrow::Array.new has been improved for nested integer list case.

See also:

Arrow::FixedSizeListArray.new(data_type, values) support has been added.

See also: GH-48610

C GLib

We use Arrow-${MAJOR}.${MINOR}.{gir,typelib} not Arrow-1.0.{gir,typelib} for .gir and .typelib file names. It's for co-existent multiple C GLib versions in the same system.

See also: GH-48616

Java, JavaScript, Go, .NET, Swift and Rust Notes

The Java, JavaScript, Go, .NET, Swift and Rust projects have moved to separate repositories outside the main Arrow monorepo.