Apache Arrow 22.0.0 (24 October 2025)
This is a major release covering more than 3 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 255 commits from 60 distinct contributors.
$ git shortlog -sn apache-arrow-21.0.0..apache-arrow-22.0.0
56 Sutou Kouhei
25 Raúl Cumplido
21 Antoine Pitrou
19 Nic Crane
15 William Ayd
10 Bryce Mecum
8 Rossi Sun
8 dependabot[bot]
7 Arash Andishgar
7 Hiroyuki Sato
6 Sarah Gilmore
5 Alenka Frim
4 Alina (Xi) Li
4 Antoine Prouvost
3 Bogdan Romenskii
3 egolearner
2 Adam Reeve
2 Ayush Bansal
2 Enrico Minack
2 Eric Dinse
2 Gang Wu
2 Hadrian Reppas
2 Patrick J. Roddy
2 Rok Mihevc
2 Zehua Zou
2 gitmodimo
1 Adrien Destugues
1 Aihua Xu
1 Ben Harkins
1 CGiachalis
1 Curt Hagenlocher
1 Diego Sevilla Ruiz
1 Fabian Iwand
1 Graham Markall
1 HackP0!nt
1 Igor Antropov
1 Johanna
1 Jonas Dedden
1 Jonathan Keane
1 Kevin Gurney
1 Kit Lee
1 Koustubh Rao
1 Lester Fan
1 Marcin Krystianc
1 Matt Topol
1 Neal Richardson
1 Princeton Mixtey
1 Seungsoo Lee
1 Soroush Rasti
1 Srinivas Lade
1 TennyZhuang
1 Vidhya Ravikumar
1 Yibo Cai
1 ZENOTME
1 corpoverlords
1 eitsupi
1 ff-kamal
1 jeremycostanzo
1 justing-bq
1 mwish
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-21.0.0..apache-arrow-22.0.0
96 Sutou Kouhei
46 Antoine Pitrou
38 Raúl Cumplido
15 Nic Crane
10 AlenkaF
10 Rossi Sun
7 Curt Hagenlocher
6 Sarah Gilmore
4 Bryce Mecum
4 Rok Mihevc
3 David Li
2 Adam Reeve
2 Will Ayd
1 Gang Wu
1 Jacob Wujciak-Jens
1 Jonathan Keane
Changelog
Bug Fixes
- GH-26727 - [C++][Flight] Use ipc::RecordBatchWriter with custom IpcPayloadWriter for TransportMessageWriter (DoExchange) (#47410)
- GH-31603 - [C++] Wrap Parquet encryption keys in SecureString (#46017)
- GH-40911 - [C++][Compute] Fix the decimal division kernel dispatching (#47445)
- GH-41011 - [C++][Compute] Fix the issue that comparison function could not handle decimal arguments with different scales (#47459)
- GH-41110 - [C#] Handle empty stream in ArrowStreamReaderImplementation (#47098)
- GH-41336 - [C++][Compute] Fix case_when kernel dispatch for decimals with different precisions and scales (#47479)
- GH-42971 - [C++] Parquet stream writer: Allow writing BYTE_ARRAY with converted type NONE (#44739)
-
GH-43355 - [C++] Don't require
__once_proxyinsymbols.map(#47354) - GH-46629 - [Python] Add options to DatasetFactory.inspect (#46961)
- GH-46690 - [GLib][CI] Use Meson 1.8.4 or later (#47425)
- GH-46739 - [C++] Fix Float16 signed zero/NaN equality comparisons (#46973)
- GH-46897 - [Docs][C++][Python] Fix asof join documentation (#46898)
- GH-46928 - [C++] Retry on EINTR while opening file in FileOpenReadable (#47629)
- GH-46942 - [Docs] Replace the directive versionadded with note (#46997)
- GH-46946 - [Python] PyArrow fails compiling without CSV enabled
- GH-47009 - [C#] ExportedAllocationOwner should use 64-bit integer to track total allocated memory. (#47011)
- GH-47016 - [C++][FlightSQL] Fix negative timestamps to date types (#47017)
- GH-47027 - [C++][Parquet] Fix repeated column pages not being written when reaching page size limit (#47032)
- GH-47029 - [Archery][Integration] Fix generation of run-end-encoded data (#47653)
- GH-47039 - [C++] Bump RapidJSON dependency in Meson configuration (#47041)
- GH-47051 - [Python][Release] verify-rc-source-windows Python tests are failing due to MSVC compiler bug
- GH-47052 - [CI][C++] Use Alpine Linux 3.22 instead of 3.18 (#47148)
- GH-47096 - [CI][R] Drop support for R 4.0 (#47285)
- GH-47101 - [Statistics][C++] Implement Statistics specification attribute ARROW:distinct_count:approximate (#47183)
- GH-47124 - [C++][Dataset] Fix DatasetWriter deadlock on concurrent WriteRecordBatch (#47129)
- GH-47128 - [Python] Numba-CUDA interop with NVIDIA bindings (#47150)
- GH-47130 - [Packaging][deb] Fix upgrade from 20.0.0-1 (#47343)
- GH-47131 - [C#] Fix day off by 1 in Date64Array (#47132)
-
GH-47143 - [Dev] Ignore
apache-arrow.tar.gz(#47145) - GH-47162 - [Dev][Release][GLib] Fix indent in generate-version-header.py (#47163)
- GH-47165 - [Python] Update s3 test with new non-existent bucket (#47166)
- GH-47175 - [C++] Require xsimd 13.0.0 or later (#47221)
- GH-47179 - [Python] Revert FileSystem.from_uri to be a staticmethod again (#47178)
- GH-47203 - [C++] Restore CMAKE_DEBUG_POSTFIX in building bundled Apache Thrift (#47209)
- GH-47213 - [R] Require CMake 3.26 or later (#47217)
- GH-47229 - [C++][Arm] Force mimalloc to generate armv8.0 binary (#47766)
- GH-47234 - [C++][Python] Add test for fill_null regression on Windows (#47249)
- GH-47241 - [C++][Parquet] Fix VariantExtensionType conversion (#47242)
- GH-47243 - [C++] Initialize arrow::compute in execution_plan_documentation_examples (#47227)
- GH-47256 - [Python] Do not use cffi in free-threaded 3.13 builds (#47313)
- GH-47257 - [R] Fix truncation of time variables to work with numeric subseconds time with hms bindings (#47278)
-
GH-47265 - [Ruby] Fix wrong
Timeobject detection (#47267) - GH-47268 - [C++][Compute] Fix discarded bad status for call binding (#47284)
- GH-47277 - [C++] r-binary-packages nightly failures due to incompatibility with old compiler (#47299)
- GH-47283 - [C++] Fix flight visibility issue in Meson configuration (#47298)
- GH-47287 - [C++][Compute] Add constraint for kernel signature matching and use it for binary decimal arithmetic kernels (#47297)
- GH-47301 - [Python] Fix FileFragment.open() seg fault behavior for file-like objects (#47302)
- GH-47303 - [C++] Don't install arrow-compute.pc twice (#47304)
- GH-47323 - [R][CI] test-r-rhub-debian-gcc-release-custom-ccache nightly job fails due to update in Debian (#47611)
- GH-47332 - [C++][Compute] Fix the issue that the arguments of function call become invalid before wrapping results (#47333)
- GH-47356 - [R] NEWS file states version 20.0.0.1 but release package number on CRAN is 20.0.0.2 (#47421)
- GH-47367 - [Packaging][Python] Patch vcpkg to show logs and install newer Windows SDK for vs_buildtools (#47484)
- GH-47373 - [C++] Raise for invalid decimal precision input from the C Data Interface (#47414)
- GH-47380 - [Python] Apply maps_as_pydicts to Nested MapScalar Values (#47454)
- GH-47399 - [C++] Update bundled Apache ORC to 2.2.0 with Protobuf patch (#47408)
- GH-47431 - [C++] Improve Meson configuration for WrapDB distribution (#47541)
- GH-47434 - [C++] Fix issue preventing running of tests on Windows (#47455)
- GH-47440 - [C++] Accept gflags::gflags as system gflags CMake target (#47468)
- GH-47446 - [C++] Update Meson configuration with compute swizzle change (#47448)
- GH-47451 - [Python][CI] Install tzdata-legacy in newer python-wheel-manylinux-test images (#47452)
- GH-47453 - [Packaging][CI] Token expired to upload nightly wheels
- GH-47485 - [C++][CI] Work around Valgrind failure on Azure tests (#47496)
- GH-47486 - [Dev][R] Define default R_UPDATE_CLANG (#47487)
- GH-47491 - [C++] Don't set include directories to found targets (#47492)
- GH-47506 - [CI][Packaging] Fix Amazon Linux 2023 packages verification (#47507)
- GH-47534 - [C++] Detect conda-installed packages in Meson CI (#47535)
- GH-47537 - [C++] Use pkgconfig name for benchmark in Meson (#47538)
- GH-47539 - [C++] Detect Snappy and bzip2 in Meson CI (#47540)
- GH-47554 - [C++] Fix Meson Parquet symbol visibility issues (#47556)
- GH-47560 - [C++] Fix host handling for default HDFS URI (#47458)
- GH-47570 - [CI] Don't notify nightly "CI: Extra" result from forks (#47571)
- GH-47590 - [C++] Use W functions explicitly for Windows UNICODE compatibility (#47593)
- GH-47591 - [C++] Fix passing zlib compression level (#47594)
- GH-47596 - [C++][Parquet] Fix printing of large Decimal statistics (#47619)
- GH-47602 - [Python] Make Schema hashable even when it has metadata (#47601)
- GH-47614 - [CI] Upgrade vcpkg on our CI (#47627)
- GH-47620 - [CI][C++] Use Ubuntu 24.04 for ASAN UBSAN job (#47623)
- GH-47625 - [Python] Free-threaded musllinux and manylinux wheels started failing with cffi 2.0.0 (#47626)
- GH-47655 - [C++][Parquet][CI] Fix failure to generate seed corpus (#47656)
- GH-47659 - [C++] Fix Arrow Flight Testing's unresolved external symbol error (#47660)
- GH-47673 - [CI][Integration] Fix Go build failure (#47674)
-
GH-47682 - [R]
install_pyarrow(nightly = TRUE)installs old pyarrow (#47699) - GH-47695 - [CI][Release] Link arrow-io hdfs_test to c++fs on compilers where std:::filesystem is not default present (#47701)
- GH-47740 - [C++][Parquet] Fix undefined behavior when reading invalid Parquet data (#47741)
- GH-47742 - [C++][CI] Silence Valgrind leak on protobuf initialization (#47743)
- GH-47748 - [C++][Dataset] Fix link error on macOS (#47749)
- GH-47795 - [Archery] Add support for custom Docker registry (#47796)
- GH-47803 - [C++][Parquet] Fix read out of bounds on invalid RLE data (#47804)
- GH-47809 - [CI][Release] Fix Windows verification job trying to install patch from conda (#47810)
- GH-47819 - [CI][Packaging][Release] Avoid triggering Linux packages on release branch push (#47826)
- GH-47838 - [C++][Parquet] Set Variant specification version to 1 to align with the variant spec (#47835)
New Features and Improvements
- GH-20125 - [Docs][Python] Restructure developers/python.rst (#47334)
- GH-30036 - [C++] Timezone-aware kernels should handle offset strings (e.g. "+04:30") (#12865)
-
GH-38211 - [MATLAB] Add support for creating an empty
arrow.tabular.RecordBatchby callingarrow.recordBatchwith no input arguments (#47060) -
GH-38213 - [MATLAB] Create a superclass for tabular type MATLAB tests (i.e. for
TableandRecordBatch) (#47107) -
GH-38422 - [MATLAB] Add
NumNullsproperty toarrow.array.Arrayclass (#47116) -
GH-38532 - [MATLAB] Add a
validatemethod to allarrow.array.Arrayclasses (#47059) -
GH-38572 - [Docs][MATLAB] Update
arrow/matlab/README.mdwith the latest change. (#47109) - GH-39875 - [C++] Why arrow decimal divide precision and scale is not correct?
- GH-41108 - [Docs] Remove Sphinx pin (#47326)
- GH-41239 - [C++] Support to write csv header without quotes (#47524)
-
GH-41476 - [Python][C++] Impossible to specify
is_adjusted_to_utcforTimetype when writing to Parquet (#47316) - GH-42137 - [CI][Python] Add Python Windows GitHub Action and remove AppVeyor (#47567)
- GH-43662 - [R] Add binding to stringr::str_replace_na() (#47521)
-
GH-43694 - [C++] Add
Executor *Option toarrow::dataset::ScanOptions(#43698) - GH-43904 - [CI][Python] Stop uploading nightly wheels to gemfury (#47470)
- GH-44345 - [C++][Parquet] Add Decimal32/64 support to Parquet (#47427)
- GH-44800 - [C#] Implement Flight SQL Client (#44783)
- GH-45055 - [C++][Flight] Update Flight Server RecordBatchStreamImpl to reuse ipc::RecordBatchWriter with custom IpcPayloadWriter instead of manually generating FlightPayload (#47115)
- GH-45056 - [C++][Flight] Fully support dictionary replacement in Flight
- GH-45382 - [Python] Add support for pandas DataFrame.attrs (#47147)
- GH-45639 - [C++][Statistics] Add support for ARROW:average_byte_width:{exac,approximate} (#46385)
- GH-45860 - [C++] Respect CPU affinity in cpu_count and ThreadPool default capacity (#47152)
- GH-45921 - [Release][R] Use GitHub Release not apache.jfrog.io (#45964)
- GH-46137 - [C++] Replace grpc-cpp conda package with libgrpc (#47606)
-
GH-46272 - [C++] Build Arrow libraries with
-Wmissing-definitionson gcc (#47042) - GH-46374 - [Python][Doc] Improve docs to specify that source argument on parquet.read_table can also be a list of strings (#47142)
- GH-46410 - [C++] Add parquet options to Meson configuration (#46647)
- GH-46669 - [CI][Archery] Automate Zulip and email notifications for Extra CI (#47546)
- GH-46728 - [Python] Skip test_gdb.py tests if PyArrow wasn't built debug (#46755)
- GH-46835 - [C++] Add more configuration options to arrow::EqualOptions (#47204)
- GH-46860 - [C++] Making HalfFloatBuilder accept Float16 as well as uint16_t (#46981)
- GH-46905 - [C++][Parquet] Expose Statistics.is_{min/max}_value_exact and default set to true if min/max are set (#46992)
- GH-46908 - [Docs][Format] Add variant extension type docs (#47456)
- GH-46937 - [C++] Enable arrow::EqualOptions for arrow::Table (#47164)
- GH-46938 - [C++] Enhance arrow::ChunkedArray::Equals to support floating-point comparison when values share the same memory (#47044)
- GH-46939 - [C++] Add support for shared memory comparison in arrow::RecordBatch (#47149)
- GH-46962 - [C++][Parquet] Generic xsimd function and dynamic dispatch for Byte Stream Split (#46963)
- GH-46971 - [C++][Parquet] Use temporary buffers when decrypting Parquet data pages (#46972)
- GH-46982 - [C++] Remove Boost dependency from hdfs_test (#47200)
- GH-47005 - [C++] Disable exporting CMake packages (#47006)
- GH-47012 - [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA (#47013)
- GH-47040 - [C++] Refine reset of Span to be reusable (#47004)
- GH-47045 - [CI][C++] Use Fedora 42 instead of 39 (#47046)
- GH-47047 - [CI][C++] Use Google Cloud Storage Testbench v0.55.0 (#47048)
- GH-47058 - [Release] Update Release Management Guide to reflect status in preparation for Arrow 22 (#47474)
- GH-47075 - [Release][Dev] Use GH_TOKEN as GitHub token environment variable (#47181)
- GH-47084 - [Release] Stop using https://dist.apache.org/repos/dist/dev/arrow/KEYS (#47182)
- GH-47088 - [CI][Dev] Fix shellcheck errors in the ci/scripts/integration_arrow.sh (#47089)
- GH-47102 - [Statistics][C++] Implement Statistics specification attribute ARROW:max_byte_width:{exact,approximate} Component: C++ (#47463)
- GH-47106 - [R] Update R package to use R 4.1+ native forward pipe syntax (#47622)
- GH-47112 - [Parquet][C++] Rle BitPacked parser (#47294)
- GH-47120 - [R] Update NEWS for 21.0.0 (#47121)
- GH-47123 - [Python] Add Enums to PyArrow Types (#47139)
- GH-47125 - [CI][Dev] Fix shellcheck errors in the ci/scripts/integration_hdfs.sh (#47126)
- GH-47137 - [Python][dependency-groups] ` (#47176)
- GH-47153 - [Docs][C++] Update cmake target table in build_system.rst with newly added targets (#47154)
- GH-47157 - [Docs] Improve presentation of Other available packages section in build_system.rst (#47411)
- GH-47172 - [Python] Add a utility function to create Arrow table instead of pandas df (#47199)
- GH-47184 - [Parquet][C++] Avoid multiplication overflow in FixedSizeBinaryBuilder::Reserve (#47185)
- GH-47191 - [R] Turn GCS back on by default on MacOS source builds (#47192)
- GH-47193 - [R] Update R Makefile to exclude flight odbc from cpp sync (#47194)
- GH-47205 - [C++] Suppress GNU variadic macro warnings (#47286)
- GH-47208 - [C++][CI] Add a CI job for C++23 (#47261)
- GH-47208 - [C++] Update bundled s2n-tls to 1.5.23 (#47220)
- GH-47211 - [CI][R] Disable non-system memory allocators when on linux-devel (#47212)
- GH-47218 - [C++] Update bundled s2n-tls
- GH-47222 - [CI][C++] Add a CI job that uses the same build options for JNI on macOS (#47305)
- GH-47223 - [Release] Use "upstream" as apache/arrow{,-site} remote name (#47224)
- GH-47225 - [C++] Remove Skyhook (#47262)
- GH-47232 - [Ruby] Suppress warnings in test with Ruby 3.5 (#47233)
- GH-47244 - [CI][Dev] Fix shellcheck errors in the ci/scripts/msys2_setup.sh (#47245)
-
GH-47258 - [Release] Set
date:for apache/arrow-site's_release/${VERSION}.md(#47260) -
GH-47263 - [MATLAB] Add
NumNullsproperty toarrow.array.ChunkedArrayclass (#47264) - GH-47289 - [CI][Dev] Fix shellcheck errors in the ci/scripts/python_build_emscripten.sh (#47290)
- GH-47291 - [C++] Update bundled aws-c-common to 0.12.4 (#47292)
- GH-47306 - [CI][Dev] Fix shellcheck errors in the ci/scripts/python_build.sh (#47307)
- GH-47312 - [Packaging] Add support for Debian forky (#47342)
- GH-47317 - [C++][C++23][Gandiva] Use pointer for Cache test (#47318)
- GH-47319 - [CI] Fix actions/checkout hash version comments (#47320)
- GH-47321 - [CI][Dev] Fix shellcheck errors in the ci/scripts/python_sdist_test.sh (#47322)
- GH-47338 - [C++][Python] Remove deprecated string-based Parquet encryption methods (#47339)
- GH-47349 - [C++] Include request ID in AWS S3 Error (#47351)
- GH-47358 - [Python] IPC and Flight options representation (#47461)
- GH-47370 - [Python] Require Cython 3.1 (#47396)
- GH-47375 - [C++][Compute] Move scatter function into compute core (#47378)
- GH-47384 - [C++][Acero] Isolate BackpressureHandler from ExecNode (#47386)
- GH-47395 - [R] Update fedora-clang to install latest clang version to match CRAN setup (#47206)
- GH-47401 - [C++] Remove needless Snappy patch (#47407)
-
GH-47404 - [Ruby] Remove needless
require "extpp/setup"(#47405) - GH-47412 - [C++] Use inlineshidden visibility in Meson configuration (#47413)
- GH-47422 - [Python][C++][Flight] Expose ipc::ReadStats in Flight MetadataRecordBatchReader (#47432)
- GH-47438 - [Python][Packaging] Set up wheel building for Python 3.14 (#47616)
- GH-47443 - [Python][Packaging] Drop Python 3.9 support (#47478)
- GH-47449 - [C++][Parquet] Do not drop all Statistics if SortOrder is UNKNOWN (#47466)
- GH-47469 - [C++][Gandiva] Add support for LLVM 21.1.0 (#47473)
- GH-47483 - [C++] Bump vendored xxhash to 0.8.3 (#47476)
- GH-47500 - [C++] Add QualifierAlignment to clang-format options (#47501)
- GH-47505 - [CI][C#][Integration] Use apache/arrow-dotnet (#47508)
- GH-47509 - [CI][Packaging][Linux] Enable Docker build cache (#47510)
- GH-47512 - [C++] Bump meson-fmt in pre-commit to 1.9.0 (#47513)
- GH-47514 - [C++][Parquet] Add unpack tests and benchmarks (#47515)
- GH-47516 - [C++][FlightRPC] Initial ODBC driver framework (#47517)
-
GH-47518 - [C++][FlightRPC] Replace
spdlogswith Arrow's Internal Logging (#47645) - GH-47523 - [C#] Remove csharp/ (#47547)
- GH-47543 - [C++] Search for system install of Azure libraries with Meson (#47544)
-
GH-47552 - [C++] Fix creating wrong object by
FixedShapeTensorType::MakeArray()(#47533) - GH-47575 - [Python] add quoting_header option to pyarrow WriterOptions (#47610)
- GH-47582 - [CI][Packaging] Move linux-packaging tasks to apache/arrow repository (#47600)
- GH-47584 - [C++][CI] Remove "large memory" mark from TestListArray::TestOverflowCheck (#47585)
- GH-47588 - [C++] Bump mimalloc version to 3.1.5 (#47589)
- GH-47597 - [C++][Parquet] Fuzz more data types (#47621)
- GH-47632 - [CI][C++] Add a CI job for JNI on Linux (#47746)
-
GH-47633 - [Dev][Integration] Write all files with
--write_generated_json(#47634) - GH-47639 - [Benchmarking] Clean up conbench config (#47638)
- GH-47646 - [C++][FlightRPC] Follow Naming Convention (#47658)
- GH-47648 - [Archery][Integration] More granularity in JSON test cases (#47649)
- GH-47650 - [Archery][Integration] Add option to generate gold files (#47651)
- GH-47679 - [C++] Register arrow compute calls in ODBC (#47680)
- GH-47704 - [R] Update paths in nightly libarrow upload job (#47727)
- GH-47705 - [R][CI] Migrate rhub debian-gcc-release to equivalent supported image (#47730)
- GH-47738 - [R] Update NEWS.md for 22.0.0 (#47739)