Apache Arrow 17.0.0 (16 July 2024)
This is a major release covering more than 2 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 529 commits from 92 distinct contributors.
$ git shortlog -sn apache-arrow-16.1.0..apache-arrow-17.0.0
84 dependabot[bot]
47 Sutou Kouhei
25 Hyunseok Seo
25 Joris Van den Bossche
22 Raúl Cumplido
21 Adam Reeve
21 Vibhatha Lakmal Abeykoon
20 mwish
18 Laurent Goujon
15 Felipe Oliveira Carvalho
14 abandy
13 Sarah Gilmore
12 Rossi Sun
11 Neal Richardson
10 Alenka Frim
10 Antoine Pitrou
10 Bryce Mecum
9 ZhangHuiGui
8 Jonathan Keane
6 Dewey Dunnington
6 Dominik Moritz
6 Matt Topol
5 Gang Wu
5 William Ayd
4 Curt Hagenlocher
4 Dane Pitkin
4 David Li
4 Tai Le Manh
4 h-vetinari
3 Ian Cook
3 Jacob Wujciak-Jens
3 Kevin Gurney
3 Rok Mihevc
3 Thomas A Caswell
3 Wyatt Alt
2 Ben Harkins
2 Benjamin Kietzman
2 Haocheng Liu
2 JB Onofré
2 Joe Marshall
2 Joel Lubinitsky
2 Nic Crane
2 Steve Lord
2 Thomas Newton
2 Tom Scott-Coombes
2 Weston Pace
1 Adam Curtis
1 Alan Stoate
1 AlbertXingZhang
1 Alex Shcherbakov
1 Anja Kefala
1 Austin Dickey
1 Calvin Kirs
1 Clif Houck
1 David Schlosnagle
1 David Sisson
1 DenisTarasyuk
1 Ed
1 Even Rouault
1 Finn Völkel
1 Francis
1 Gavin Murrison
1 Ivan Chesnov
1 Jaap Versteegh
1 Jacek Stania
1 Jacob Hayes
1 James Duong
1 Joshua MacDonald
1 Judah Rand
1 Kartik Verma
1 Kelvin Wu
1 Kirill Khramkov
1 Konstantin Malanchev
1 Lei (Alexandra) Wang
1 LucasG0
1 Mike Bostock
1 Noam Ross
1 Nozomi Isozaki
1 PHILO-HE
1 PJ Fanning
1 Paul Taylor
1 Stephan T. Lavavej
1 Tao He
1 Tom McTiernan
1 Wenbo Li
1 Yifeng-Sigma
1 a-reich
1 andyfan
1 feik
1 hemidark
1 keshen-msft
1 normanj-bitquill
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-16.1.0..apache-arrow-17.0.0
130 Sutou Kouhei
89 David Li
39 Curt Hagenlocher
38 Antoine Pitrou
37 Joris Van den Bossche
22 Felipe Oliveira Carvalho
22 Raúl Cumplido
20 Matt Topol
14 mwish
11 Jacob Wujciak-Jens
11 Sarah Gilmore
8 AlenkaF
6 Bryce Mecum
6 Jonathan Keane
5 Benjamin Kietzman
5 Gang Wu
4 Dane Pitkin
4 Dewey Dunnington
4 Rok Mihevc
4 Weston Pace
3 Kevin Gurney
3 Nic Crane
3 dependabot[bot]
2 Will Jones
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
- GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
- GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
- GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
- GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
- GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
- GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
- GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
- GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
- GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
- GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
- GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
- GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
- GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
- GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
- GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
- GH-41112 - [C++] Clean up unused parameter warnings (#41111)
- GH-41149 - [C++][Acero] Fix asof join race (#41614)
- GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
- GH-41190 - [C++] support for single threaded joins (#41125)
- GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
- GH-41198 - [C#] Fix concatenation of union arrays (#41226)
- GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
- GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
- GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
- GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
- GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
- GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
- GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
- GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
- GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
- GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
- GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
- GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
- GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
- GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
- GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
- GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
- GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
- GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
- GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
- GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
- GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
- GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
- GH-41471 - [Java] Fix performance uber-jar (#41473)
- GH-41475 - [Python] Build with Python 3.13 (#42034)
- GH-41478 - [C++] Clean up more redundant move warnings (#41487)
- GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
- GH-41502 - [Python] Fix reading column index with decimal values (#41503)
- GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
- GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
- GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
- GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
- GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
- GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
- GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
- GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
- GH-41602 - [C#] Resolve build warnings (#41645)
- GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
- GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
- GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
- GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
- GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
- GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
- GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
- GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
- GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
- GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
- GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
- GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
- GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
- GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
- GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
- GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
- GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
- GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
- GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
- GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
- GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
- GH-41787 - Update fmpp-maven-plugin output directory (#41788)
- GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
- GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows
(#42188) - GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
- GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
- GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
- GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
- GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
- GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
- GH-41920 - [CI][JS] Add missing build directory argument (#41921)
- GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
- GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
- GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to find pom.xml (#42008)
- GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
- GH-42015 - [MATLAB] Executing
tfeather.m
test class causes MATLAB to crash onwindows-2022
after MSVC update from 14.39.33519 to 14.40.33807 (#42123) - GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
- GH-42039 - [Docs][Go] Fix broken link (#42040)
- GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
- GH-42065 - [C++] Support list-views on list_slice (#42067)
- GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
- GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
- GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
- GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
- GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
- GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
- GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
- GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
- GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
- GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
- GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
- GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
- GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
- GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
- GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
- GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
- GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
- GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
- GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
- GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
- GH-43062 - [Go] Use calloc instead of malloc (#43052)
- GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
- GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
- GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
- GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
- GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
- GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
- GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
- GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
- GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
- GH-33484 - [C++][Compute] Implement
Grouper::Reset
(#41352) - GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
- GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
- GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
- GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
- GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
- GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
- GH-37938 - [Swift] Add initial C data interface implementation (#41342)
- GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
- GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
- GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
- GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
- GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
- GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
- GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
- GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
- GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
- GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
- GH-39722 - [JS] Clean up packaging (#39723)
- GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
- GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
- GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
- GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
- GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
- GH-40339 - [Java] StringView Initial Implementation (#40340)
- GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
- GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
- GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
- GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
- GH-40494 - [Go] add support for protobuf messages (#40496)
- GH-40644 - [Python] Allow passing a mapping of column names to
rename_columns
(#40645) - GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
- GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
- GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
- GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
- GH-40822 - [Java] Adding Spotless to C module (#42059)
- GH-40823 - [Java] Adding Spotless to Compression module (#42060)
- GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
- GH-40825 - [Java] Adding Spotless to Flight module (#42063)
- GH-40826 - [Java] Adding Spotless to Format module
- GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
- GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
- GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
- GH-40830 - [Java] Adding Spotless to Performance module (#42057)
- GH-40831 - [Java] Adding Spotless to Tools module (#42058)
- GH-40832 - [Java] Adding Spotless to Vector module (#42061)
- GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
- GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
- GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
- GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
- GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
- GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
- GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
- GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
- GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
- GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
- GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
- GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
- GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
- GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
- GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
- GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
- GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
- GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
- GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
- GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
- GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
- GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
- GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
- GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
- GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
- GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
- GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
- GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
- GH-41287 - [Java] ListViewVector Implementation (#41285)
- GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
- GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
- GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
- GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
- GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
- GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
- GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
- GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
- GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
- GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
- GH-41358 - [R] Support join “na_matches” argument (#41372)
- GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
- GH-41375 - [C#] Move to .NET 8.0 (#41376)
- GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB
R2024a
in CI and crossbow packaging workflows (#41504) - GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
- GH-41400 - [MATLAB] Bump
libmexclass
version to commitca3cea6
(#41436) - GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
- GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
- GH-41427 - [Go] Fix stateless prepared statements (#41428)
- GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
- GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on
macos-14
(#41592) - GH-41450 - [R][CI] rhub/container follow ons (#41451)
- GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
- GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
- GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
- GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
- GH-41507 - [MATLAB][CI] Pass
strict: true
tomatlab-actions/run-tests@v2
(#41530) - GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
- GH-41531 - [MATLAB][Packaging] Bump
matlab-actions/setup-matlab
andmatlab-actions/run-command
fromv1
tov2
in thecrossbow
job (#41532) - GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
- GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
- GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
- GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
- GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
- GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
- GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
- GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
- GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
- GH-41620 - [Docs] Document merge.conf usage (#41621)
- GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
- GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
- GH-41653 - [MATLAB] Add new
arrow.c.Array
MATLAB class which wraps a C Data Interface formatArrowArray
C struct (#41655) - GH-41654 - [MATLAB] Add new
arrow.c.Schema
MATLAB class which wraps a C Data Interface formatArrowSchema
C struct (#41674) - GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for
arrow.array.Array
(#41737) - GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
- GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
- GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
- GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
- GH-41691 - [Doc] Remove notion of “logical type” (#41958)
- GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
- GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
- GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
- GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
- GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
- GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
- GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
- GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
- GH-41783 - [C++] Make git-dependent definitions internal (#41781)
- GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
- GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
- GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
- GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for
arrow.tabular.RecordBatch
(#41817) - GH-41804 - [Swift] Add Struct (Nested) type (#43082)
- GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
- GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
- GH-41834 - [R] Better error handling in dplyr code (#41576)
- GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
- GH-41887 - [Go] Run linter via pre-commit (#41888)
- GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
- GH-41905 - [JS] Update dependencies (#41906)
- GH-41910 - [Python] Add support for Pyodide (#37822)
- GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
- GH-41929 - [Java] pom.xml license formatting (#42049)
- GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
- GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
- GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
- GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
- GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
- GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
- GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
- GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
- GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
- GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
- GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
- GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
- GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
- GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
- GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
- GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
- GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
- GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
- GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
- GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
- GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
- GH-42087 - [Swift] refactored to remove build warnings (#42088)
- GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
- GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
- GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
- GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
- GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
- GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
- GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
- GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
- GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
- GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
- GH-42143 - [R] Sanitize R metadata (#41969)
- GH-42146 - [MATLAB] Add IPC
RecordBatchFileReader
andRecordBatchFileWriter
MATLAB classes (#42201) - GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
- GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
- GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
- GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
- GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
- GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
- GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
- GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
- GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
- GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
- GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
- GH-43020 - [Java] Simplify flight.properties generation (#43028)
- GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
- GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
- GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
- GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
- GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)