Apache Arrow 19.0.0 (16 January 2025)
This is a major release covering more than 2 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 330 commits from 67 distinct contributors.
$ git shortlog -sn apache-arrow-18.1.0..apache-arrow-19.0.0
65 Sutou Kouhei
60 dependabot[bot]
29 Raúl Cumplido
23 Bryce Mecum
20 Antoine Pitrou
16 Hiroyuki Sato
8 mwish
7 David Li
7 Gang Wu
6 Matt Topol
6 Rossi Sun
5 Joris Van den Bossche
4 Adam Reeve
4 Alex Kesling
3 Thomas Newton
3 mroz45
2 Alessandro Molina
2 Dewey Dunnington
2 Dongjoon Hyun
2 Elliott Sales de Andrade
2 Enrico Minack
2 Ian Cook
2 Kevin Gurney
2 Laurent Goujon
2 Maksim Yegorov
2 Neal Richardson
2 Sarah Gilmore
2 Tim Grein
2 Tyler White
1 0xderek
1 Agriya Khetarpal
1 Anja Kefala
1 Benedikt Reinartz
1 Benjamin Kietzman
1 Brad Smith
1 Curt Hagenlocher
1 Emmanuel Ferdman
1 Felipe Oliveira Carvalho
1 Francis
1 Gabriel P Stone
1 GeorgKreuzmayr
1 George Vanburgh
1 Hang Zheng
1 Igor Anferov
1 Jacob Wujciak-Jens
1 Jeges Attila
1 Jonas Benn
1 Jonathan Keane
1 Kevin Wilson
1 Maarten Pronk
1 Michael Lui
1 Michael Sebbah
1 Mohammad Linjawi
1 Nic Crane
1 Nis Meinert
1 Rok Mihevc
1 Roman
1 Sylvain Joubert
1 Takaaki Koike
1 Timo
1 Tom Jakubowski
1 ViggoC
1 bretttully
1 c8ef
1 eitsupi
1 flashzxi
1 h-vetinari
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-18.1.0..apache-arrow-19.0.0
142 Sutou Kouhei
31 Antoine Pitrou
29 David Li
24 Raúl Cumplido
23 Curt Hagenlocher
14 Jacob Wujciak-Jens
11 mwish
9 Bryce Mecum
7 Rossi Sun
4 Gang Wu
4 Matt Topol
3 Felipe Oliveira Carvalho
3 Jonathan Keane
2 Joris Van den Bossche
2 Sarah Gilmore
1 Dewey Dunnington
1 Kevin Gurney
1 Ruoxi Sun
1 Weston Pace
Changelog
New Features and Improvements
- GH-23995 - [C#] Make PrimitiveArrayBuilder constructor public (#44596)
- GH-27919 - [CI][C++] Add a nightly job to test offline build (#44721)
- GH-32206 - [C++] GcsFileSystem::Make should return Result (#44503)
- GH-35589 - [Ruby] Add support or JRuby (#44346)
- GH-36954 - [Python] Add more FlightInfo / FlightEndpoint attributes (#43537)
- GH-38837 - [Format] Add the specification for statistics schema (#45058)
- GH-39212 - [Release] Retry download binary subprocess in case of OpenSSL error (#39213)
- GH-40592 - [C++][Parquet] Implement SizeStatistics (#40594)
- GH-40713 - [Docs] Trim down ADBC page (#44313)
- GH-41141 - [C++] Reduce string inlining in Substrait serde (#45174)
- GH-41706 - [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input (#44083)
- GH-43080 - [CI][Dev] Enable shellcheck (#44724)
- GH-43410 - [Python] Support Arrow PyCapsule stream objects in write_dataset (#43771)
- GH-43535 - [C++] Support the AWS S3 SSE-C encryption (#43601)
- GH-43547 - [R][CI] Add recheck workflow for checking reverse dependencies on GHA (#43784)
- GH-43570 - [CI][Dev][Docs] Update references to “docker-compose” with “docker compose” (#43575)
- GH-43598 - [C++][Parquet] Parquet Metadata Printer supports print sort-columns (#43599)
- GH-43631 - [C++] Add C++ implementation of Async C Data Interface (#44495)
- GH-43631 - [C][Format] Add ArrowAsyncDeviceStreamHandler interface (#43632)
- GH-43683 - [Python] Support pandas future default string dtype
- GH-43693 - [C++][Acero] Support AVX2 swiss join decoding (#43832)
- GH-43808 - [C++] skip
-0117
in StrptimeZoneOffset for old glibc (#44621) - GH-43951 - [CI][Python] Use GitHub Packages for vcpkg cache (#44644)
- GH-44010 - [C++] Add
arrow::RecordBatch::MakeStatisticsArray()
(#44252) - GH-44018 - [R] Treat builds on R-universe as like
NOT_CRAN=true
(#44476) - GH-44065 - [Java] Implement C Data Interface for RunEndEncodedVector (#44241)
- GH-44066 - [Python] Add Python wrapper for JsonExtensionType (#44070)
- GH-44084 - [C++] Improve merge step in chunked sorting (#44217)
- GH-44101 - [C++][Parquet] Tools: Debug Print for Json should be valid JSON (#44532)
- GH-44223 - [Dev] Use “Gandiva” instead of “C++ - Gandiva” label (#44722)
- GH-44308 - [C++][FS][Azure] Implement SAS token authentication (#45021)
- GH-44364 - [C++] Don’t export template class (#44365)
- GH-44392 - [GLib] Add GArrowDecimal32DataType (#44580)
- GH-44404 - [CI] Stop running CI “push” jobs for Dependabot (#44405)
- GH-44416 - [C++][Docs] Update the URL to C++ Development in README.md (#44427)
- GH-44444 - [Java][CI] Add Java implementation of Flight do_exchange integration test (#44445)
- GH-44464 - [C++] Added rvalue-reference-qualified overload for arrow::Result::status() returning value instead of reference (#44477)
- GH-44474 - [Website][Docs] Improve project description in more places (#44522)
- GH-44474 - [Website] Improve project description (#44492)
- GH-44480 - [Release][Packaging] Use
--platform
explicitly fordocker run
(#44481) - GH-44491 - [C++] StatusConstant- cheaply copied const Status (#44493)
- GH-44518 - [R] Update NEWS for 18.0.0 (#44520)
- GH-44528 - [Dev] Introduce a new feature: specific language reformatting (#44529)
- GH-44555 - [C++][Compute] Allow casting struct to bigger nullable struct (#44587)
- GH-44556 - [Release][MSYS2] Update python-pyarrow too (#44557)
- GH-44558 - [Release][Website] Remove needless “Apache Arrow ${VERSION}” section (#44559)
- GH-44569 - [GLib] Add GArrowDecimal64DataType (#44571)
- GH-44575 - [C#] Replace LINQ expression with for loop (#44576)
- GH-44578 - [Release][Packaging] Verify wheel version (#44593)
- GH-44579 - [C++] Use array type to compute min/max statistics Arrow type (#45094)
- GH-44581 - [C++] Minor: ArrayData ctor can assign null_count directly (#44582)
- GH-44588 - [GLib] Add GArrowDecimal64 Class (#44591)
- GH-44589 - [GLib] Add GArrowDecimal32 class (#44597)
- GH-44590 - [C++] Add
const
and&
toarrow::Array::statistics()
return type (#44592) - GH-44603 - [GLib] Add GArrowDecimal64Array and GArrowDecimal64ArrayBuilder (#44605)
- GH-44604 - [GLib] Add Decimal32Array (#44617)
- GH-44614 - [Python][C++] Add version suffix to libarrow_python* libraries (#44702)
- GH-44618 - [GLib] Add GArrowDecimal64Scalar (#44620)
- GH-44619 - [GLib] Add GArrowDecimal32Scalar (#44628)
- GH-44648 - [CI] Remove autotune and rebase from commentbot (#44649)
- GH-44656 - [GLib] Add GArrowBinaryViewDataType (#44659)
- GH-44667 - [Archery] Suppress pull/push progress logs (#44669)
- GH-44686 - [GLib] Add GArrowStringViewDataType (#44687)
- GH-44690 - [C++] NumericBuilder::AppendValues append vector prevent from ub (#44794)
- GH-44700 - [C++][Parquet] Remove obsolete parquet_constants generated files from old thrift (#44772)
- GH-44703 - [CI][MATLAB][Packaging] Update MATLAB CI and
crossbow
packaging workflows to build against MATLABR2024b
(#44704) - GH-44705 - [MATLAB][PACKAGING] Update the crossbow MATLAB workflow to use
macOS 13
instead ofmacOS 12
(#44858) - GH-44710 - [Docs][C++] Add
arrow::ArrayStatistics
to API doc (#44764) - GH-44713 - [Python] Add support for Decimal32 and Decimal64 types (#44882)
- GH-44744 - [C++] Upgrade ORC to 2.0.3 (#44745)
- GH-44749 - [CI][Dev] Apply ShellCheck lint to ci/scripts/c_glib_test.sh (#44750)
- GH-44770 - [Java] Update minor protobuf version to avoid CVE-2024-7254 (#44775)
- GH-44784 - [C++][Parquet] Add
arrow::Result
version ofparquet::arrow::OpenFile()
(#44785) - GH-44788 - [C++] Fix a couple of maybe-uninitialized warnings (#44789)
- GH-44795 - [C++] Use arrow::util::span on arrow::util::bitmap_builders_utilities instead of std::vector (#44796)
- GH-44805 - [Dev] Add
.editorconfig
file (#44870) - GH-44808 - [C++][Parquet] Add
arrow::Result
version ofparquet::arrow::FileReader::GetRecordBatchReader()
(#44809) - GH-44811 - [C++] minor optimize cancel and thread pool (#44812)
- GH-44815 - [C++][Parquet] Add an example to dump statistics read as
arrow::ArrayStatistics
(#44816) - GH-44828 - [Java][Release] Remove Java from release scripts (#44854)
- GH-44829 - [Java][CI] Remove Java related test CI except integration test (#44946)
- GH-44830 - [Java][CI] Disable Java Dependabot (#44832)
- GH-44831 - [CI] Build Java for integration tests from arrow-java (#44932)
- GH-44843 - [CI][Doc] Remove building Java documentation from this repository (#45000)
- GH-44878 - [Archery] Only suppress Docker progress logs when running on CI (#44865)
- GH-44903 - [C++] Add the Expm1(exponent) scalar arithmetic function (#44904)
- GH-44915 - [C++] Add WithinUlp testing functions (#44906)
- GH-44922 - [MATLAB] Add IPC
RecordBatchStreamFileWriter
MATLAB class (#44925) - GH-44923 - [MATLAB] Add IPC
RecordBatchStreamReader
MATLAB class (#45068) - GH-44934 - [Dev] Remove JIRA remnants (#44936)
- GH-44948 - [Dev][Archery] Remove JIRA remnants from Archery (#45091)
- GH-44952 - [C++][Python] Add Hyperbolic Trig functions (#44630)
- GH-44962 - [Python] Clean-up name / field_name handling in pandas compat (#44963)
- GH-44976 - [C++] Enable mimalloc by default, disable jemalloc by default and more (#44951)
- GH-44982 - [C++] Add support for building system OpenTelemetry (#44983)
- GH-44994 - [C++][CMake] Use librt only for Linux (#44984)
- GH-45005 - [C++] Support for fixed-size list in conversion of range tuple (#45008)
- GH-45015 - [C++][Parquet] Allow configuring the default footer read size (#45016)
- GH-45047 - [CI][Python][Packaging] Test 3.12 wheels on Ubuntu 24.04 (#45042)
- GH-45050 - [CI][Dev] Apply ShellCheck lint to c_glib/test/run-test.sh (#45052)
- GH-45075 - [C++] Remove result_internal.h (#45066)
- GH-45076 - [CI][Packaging][Python] Simplify dev/tasks/python-wheels/github.linux.yml (#45077)
- GH-45079 - [FlightRPC][C++] Deprecate InitializeFlightUcx before removing UCX (#45080)
- GH-45092 - [C++][Parquet] Add GetReadRanges function to FileReader (#45093)
- GH-45096 - [C++] Apply a cstdint patch to bundled Thrift for GCC 15 (#45097)
- GH-45135 - [C++] Remove useless “hash table ready” states in swiss join (#45136)
- GH-45137 - [CI][C++] Add a GCC 15 job (#45138)
- GH-45140 - [Dev][Release] Review release management guide and release scripts
- GH-45142 - [C++] Ensure using
cpp/cmake_modules/*.cmake
(#45143) - GH-45164 - [CI][Integration] Remove “java_” prefix from Java build scripts (#45165)
- GH-45166 - [CI][C++] Upgrade Alpine Linux to 3.18 from 3.16 (#45168)
- GH-45175 - [Python] Honor the strings_to_categorical keyword in to_pandas for string view type (#45176)
- GH-45178 - [CI] Remove clcache related codes (#45182)
- GH-45198 - [CI][Integration] Follow build script name change in apache/arrow-java
Bug Fixes
- GH-15233 - [C++] Fix CopyFiles when destination is a FileSystem with background_writes (#44897)
- GH-39914 - [pyarrow] Reorder to_pandas extension dtype mapping (#44720)
- GH-40633 - [C++][Python] Fix ORC crash when file contains unknown timezone (#45051)
- GH-41326 - [Python] Converting month_day_nano_interal to numpy crashes
- GH-41536 - [C++] Replace std::aligned_storage that is deprecated in C++23 (#45019)
- GH-41667 - [C++][Parquet] Refuse writing non-nullable column that contains nulls (#44921)
- GH-43124 - [C++] Initialize offset vector head as 0 after memory allocated in grouper.cc (#43123)
- GH-43949 - [C++] io::BufferedInput: Fix invalid state after SetBufferSize (#44387)
- GH-43994 - [C++][Parquet] Fix schema conversion from two-level encoding nested list (#43995)
- GH-44344 - [Java] fix VectorSchemaRoot.getTransferPair for NullVector (#44631)
- GH-44368 - [C++] Use “lib” for generating bundled dependencies even with “clang-cl” (#44391)
- GH-44372 - [C++] Fix unaligned load/store implementation for clang-18 (#44468)
- GH-44384 - [C++] Use CMAKE_LIBTOOL on macOS (#44385)
- GH-44396 - [CI][C++] Use setup-python on hosted runner (#44411)
- GH-44412 - [CI][R] Use Ubuntu 22.04 not 24.04 for Arrow C++ 15.0.2 deb packages (#44413)
- GH-44428 - [CI][C#] Use setup-python to use “pip install” (#44429)
- GH-44430 - [CI][Swfit] Use setup-python to use “pip install” (#44431)
- GH-44432 - [Swift] Use flatbuffers v24.3.7 (#44433)
- GH-44441 - [Release] Add missing
--repo apache/arrow
in02-source.sh
(#44442) - GH-44455 - [C++] Update vendored date to 3.0.3 (#44482)
- GH-44465 - [GLib][C++] Meson searches libraries with specific versions. (#44475)
- GH-44478 - [GLib] Prevent Arrow::ExtensionType.new (#44498)
- GH-44479 - [CI][Archery] Add missing Flight integration targets (#44691)
- GH-44483 - [GLib] Don’t check invalid decimal precision message in test (#44484)
- GH-44526 - [C++][Acero] Fix crash when thread in asof_join is not running (#44584)
- GH-44541 - [C++] NumericArray
should not use ctor from parent directly (#44542) - GH-44563 - [C++]
FunctionOptions::{Serialize,Deserialize}()
return an error withoutARROW_IPC
(#45171) - GH-44564 - [Java][FlightSQL] Fix native libraries relocation (#44565)
- GH-44570 - [Release][R][Docs] Update
r/pkgdown/assets/versions.html
(#44572) - GH-44574 - [Release] Ensure using the release tag to build binaries (#44577)
- GH-44585 - [JS][Release] Skip bin dir in npm-release.sh (#44861)
- GH-44601 - [GLib] Fix the wrong GARROW_AVAILABLE_IN declaration (#44602)
- GH-44624 - [CI][JS] Increase “AMD64 macOS 13 NodeJS 18” timeout (#44625)
- GH-44626 - [Java] fix SplitAndTransfer throws for empty MapVector (#44627)
- GH-44651 - [Python] Allow from_buffers to work with StringView on Python (#44701)
- GH-44657 - [CI][Dev] Add write permission to the crossbow comment bot (#44658)
- GH-44668 - [Docs] Fix ColumnChunkMetaData offset documentation in pyarrow (#44670)
- GH-44677 - [C++][Acero] Enhance partition sort example (#44678)
- GH-44679 - [C++][Python] Fix Flight Timestamp precision, revert workaround from #43537 (#44681)
- GH-44695 - [C++] Add S3 option to ignore SIGPIPE signals (#44735)
- GH-44706 - [Release][Archery][Packaging] Add “so_version” variable (#44707)
- GH-44711 - [Docs][Python] Add missing canonical extension types to PyArrow arrays and datatypes docs (#44880)
- GH-44714 - [C++] Keep field metadata for keys and values when importing a map type via the C data interface (#44715)
- GH-44716 - [Dev][Integration] Add numpy to archery integration deps (#44717)
- GH-44726 - [CI] Update substrait consumer call to use updated producer and consumer parameters (#44727)
- GH-44728 - [Python] Trigger manual Garbage collection before checking allocated bytes for dlpack tests (#44793)
- GH-44734 - [C++][CI] Fix arrow-c-bridge-test timeout with threading disabled (#44737)
- GH-44742 - [Ruby] Fix a bug that empty struct list value can’t be built (#44763)
- GH-44754 - [C++] Use lowercased
windows.h
to enable cross-platform builds (#44755) - GH-44767 - [C++] Fix Float16.To{Little,Big}Endian on big endian machines (#44768)
- GH-44769 - [C++][Parquet] Fix read/write of metadata length footer on big-endian systems (#44787)
- GH-44773 - [Dev][Doc] Remove obsolete Read the docs configuration (#44774)
- GH-44797 - [CI] Update pkg-config to pkgconf on Homebrew (#44798)
- GH-44802 - [C++][CI] Migrate to
arrow::Result
basedparquet::arrow::OpenFile() API
in example tutorials (#44807) - GH-44820 - [CI] Uninstall pkg-config entirely on macOS jobs (#44821)
- GH-44844 - [CI] Uninstall pkg-config entirely on verification and java-jars macOS jobs (#44845)
- GH-44846 - [C++] Fix thread-unsafe access in ConcurrentQueue::UnsyncFront (#44849)
- GH-44855 - [Python][Packaging] Use delvewheel to repair Windows wheels (#35323)
- GH-44885 - [Dev][Release] Update test condition in utils-prepare.sh (#44862)
- GH-44898 - [C++] Fix compilation error on GCC 8 (#44899)
- GH-44911 - [C#] Choose port numbers dynamically for ArrowStreamWriterTests (#44912)
- GH-44918 - [Ruby] Fix a bug that empty list is ignored in builder (#44933)
- GH-44954 - [C++][CI] Silence protobuf-generated deprecations (#44955)
- GH-44968 - [CI][Dev] Remove nonexistent
getJiraInfo
reference (#44972) - GH-44974 - [Dev] Fix minor issue handling (#44975)
- GH-44980 - [CI] Remove retrieval of Arrow version from Java on Spark integration and update test structure for PySpark (#44981)
- GH-44985 - [C++] Use recommended downloads URLs for ORC and Thrift (#44977)
- GH-44991 - [CI][Python] Fix and modernize AppVeyor build (#44999)
- GH-45003 - [Python][Docs] Update docstrings for metadata methods on Field and Schema classes (#45004)
- GH-45006 - [CI][Python] Fix test_memory failures (#45007)
- GH-45027 - [C++] Include path in the documentation is wrong (#45031)
- GH-45034 - [C++] Remove Parquet requirement from Arrow Acero and from Arrow Dataset when not necessary (#45035)
- GH-45039 - [CI][Packaging][Python] Fix Docker push step for free-threaded wheel builds (#45040)
- GH-45041 - [Packaging][Python] Use ORC from vcpkg instead of bundled on Linux and macOS (#45046)
- GH-45053 - [C++] Add support for Boost 1.87.0 (#45057)
- GH-45059 - [C++][CI] Fix test-build-cpp-fuzz failures (#45060)
- GH-45069 - [R][CI] Fix r-binary-packages crossbow job (#45070)
- GH-45071 - [Packaging][Docs] Fix NumPy v2 include directory for Emscripten, and update Pyodide-related documentation (#45072)
- GH-45073 - [C++][Parquet] Fix generation of repetition levels for encryption test data (#45074)
- GH-45099 - [C++] Avoid static const variable in the status.h (#45100)
- GH-45118 - [Packaging] Use armored keyring for APT repository (#45131)
- GH-45119 - [Ruby] Use #size for Arrow::Column#size backend (#45133)
- GH-45151 - [C++][Parquet] Fix Null-dereference READ in parquet::arrow::ListToSchemaField (#45152)
- GH-45157 - [CI][Packaging] Remove explicit pins of gemfury version in dev/tasks/macros.jinja (#45158)
- GH-45183 - [C++][Release] Add llvm-dev back to setup-ubuntu.sh (#45184)
- GH-45212 - [C++][Parquet] test-conda-cpp-valgrind fails on arrow-dataset-file-parquet-encryption-test