Apache Arrow 14.0.0 (1 November 2023)
This is a major release covering more than 2 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 612 commits from 116 distinct contributors.
$ git shortlog -sn apache-arrow-13.0.0..apache-arrow-14.0.0
69 Sutou Kouhei
59 dependabot[bot]
52 sgilmore10
34 Nic Crane
28 mwish
27 Raúl Cumplido
25 Kevin Gurney
19 Antoine Pitrou
19 Dewey Dunnington
17 Alenka Frim
16 Dane Pitkin
16 Matt Topol
13 Joris Van den Bossche
12 Jin Shang
11 David Li
11 Felipe Oliveira Carvalho
10 James Duong
8 Curt Hagenlocher
7 Jacob Wujciak-Jens
6 Benjamin Kietzman
6 Weston Pace
5 Frederic Branczyk
5 david dali susanibar arce
4 Ben Harkins
4 Thor
3 Bryce Mecum
3 Chris Jordan-Squire
3 Diego Fernández Giraldo
3 Francis
3 Ian Cook
3 Jonathan Keane
3 Junming Chen
3 Tim Schaub
3 h-vetinari
3 takuya kodama
2 Abe Tomoaki
2 Adam Reeve
2 Dominik Moritz
2 Elliott Brossard
2 Fokko Driesprong
2 Gang Wu
2 Mark Wolfe
2 Matthias Loibl
2 Rok Mihevc
2 Thomas Newton
2 Timothy Meehan
2 Vibhatha Lakmal Abeykoon
2 Will Jones
2 abandy
2 davidhcoe
2 jeremyosterhoudt
2 lambda
2 谢天
1 0x26res
1 Alex Shcherbakov
1 Alexander Grueneberg
1 Angela Li
1 Anja Kefala
1 Arkadiusz Rudny
1 Ashish Bailkeri
1 Austin Dickey
1 Bruno Tremblay
1 Chelsea Jones
1 Christian Lorentzen
1 Danyaal Khan
1 David Greiss
1 DenisTarasyuk
1 Donald Tolley
1 Ed Seidl
1 Edward Visel
1 Eero Lihavainen
1 Erik McKelvey
1 Fernando Mayer
1 František Nečas
1 George Godik
1 Hirokazu SUZUKI
1 Hyunseok Seo
1 Ikko Eltociear Ashimine
1 Ivan Chesnov
1 Jacek Stania
1 James Henderson
1 Jinpeng
1 Joe Marshall
1 Jonathan Swenson
1 Judah Rand
1 Justin Heesemann
1 KarateSnowMachine
1 Kevin Liu
1 Kuba Martin
1 Kyle Barron
1 Laurent Goujon
1 Li Jin
1 Michael Lui
1 Miguel Pragier
1 Paul Taylor
1 Rajat Subhra Mukherjee
1 Ray Zhang
1 SGZW
1 Sam Albers
1 Slobodan Ilic
1 Spencer Nelson
1 Srinivas Lade
1 Tero Vuotila
1 Thomas Grainger
1 Tommy Setiawan
1 Val Gridnev
1 Vitalii Tverdokhlib
1 Yue
1 andrewchambers
1 hrishisd
1 ismail simsek
1 panbingkun
1 patrick
1 pegasas
1 rtpsw
1 yyang52
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-13.0.0..apache-arrow-14.0.0
190 Sutou Kouhei
81 Antoine Pitrou
70 Kevin Gurney
46 Matt Topol
40 David Li
35 Nic Crane
31 Raúl Cumplido
28 Joris Van den Bossche
19 Jacob Wujciak-Jens
14 Dewey Dunnington
12 AlenkaF
12 Weston Pace
11 Benjamin Kietzman
4 Gang Wu
3 Dominik Moritz
2 Eric Erhardt
2 Jonathan Keane
1 Li Jin
1 Will Jones
Changelog
Apache Arrow 14.0.0 (2023-10-31 07:00:00)
Bug Fixes
- GH-15017 - [Python] Harden test_memory.py for use with ARROW_USE_GLOG=ON (#36901)
- GH-15281 - [C++] Replace bytes_view alias with span (#36334)
- GH-31621 - [JS] Fix Union null bitmaps (#37122)
- GH-32439 - [Python] Fix off by one bug when chunking nested structs (#37376)
- GH-32483 - [Docs][Python] Clarify you need to use conda-forge for installing nightly conda package (#37948)
- GH-33807 - [R] Add a message if we detect running under emulation (#37777)
- GH-34567 - [JS] Improve build and do not generate
bin/bin
directory (#36607) - GH-34640 - [R] Can’t read in partitioning column in CSV datasets when both (non-hive) partition and schema supplied (#37658)
- GH-34909 - [C++] Avoid mean overflow on large integer inputs (#37243)
- GH-35095 - [C++] Prevent write after close in arrow::ipc::IpcFormatWriter (#37783)
- GH-35167 - [Docs][C++] Use new API for arrow::json::TableReader (#37301)
- GH-35292 - [Release] Retry “apt install” (#36836)
- GH-35328 - [Go][FlightSQL] Fix flaky test for FlightSql driver (#38044)
- GH-35450 - [C++] Return error when
RecordBatch::ToStructArray
called with mismatched column lengths (#36654) - GH-35581 - [C++] Store offsets in scalars (#36018)
- GH-35641 - [CI][C++] Disable precompiled headers (#37502)
- GH-35658 - [Packaging] Sync conda recipes with feedstocks (#35637)
- GH-35770 - [Go][Documentation] Update TimestampType zero value as seconds in comment (#37905)
- GH-35942 - [C++] Improve Decimal ToReal accuracy (#36667)
- GH-36069 - [Java] Ensure S3 is finalized on shutdown (#36934)
- GH-36154 - [JS][CI] Use
jest
cache in CI (#36373) - GH-36189 - [C++][Parquet] StreamReader::SkipRows() skips to incorrect place in multi-row-group files (#36191)
- GH-36318 - [Go] only decode lengths for the number of existing values, not for all nvalues. (#36322)
- GH-36323 - [Python] Fix Timestamp scalar repr error on values outside datetime range (#36942)
- GH-36332 - [CI][Java] Integration jobs with Spark fail with NoSuchMethodError:io.netty.buffer.PooledByteBufAllocator
- GH-36371 - [Java] CycloneDX Unable to load the mojo ‘makeBom’
- GH-36379 - [C++] Bundled dependency include paths should override system include dirs (#37612)
- GH-36502 - [C++] Add run-end encoded array support to ReferencedByteRanges (#36521)
- GH-36610 - [CI][C++] Don’t enable ARROW_ACERO by default (#36611)
- GH-36619 - [Python] Parquet statistics string representation misleading (#36626)
- GH-36634 - [Dev] Ensure merge script goes over all pages when requesting info from GitHub (#36637)
- GH-36638 - [R] Error with create_package_with_all_dependencies() on Windows (#37226)
- GH-36645 - [Go] returns writer.Close error to caller when writing parquet (#36646)
- GH-36655 - [Dev] Fix fury command to upload nightly wheels (#36657)
- GH-36663 - [C++] Fix the default value information for enum options (#36684)
- GH-36680 - [Python] Add missing pytest.mark.acero (#36683)
- GH-36685 - [R][C++] Fix illegal opcode failure with Homebrew (#36705)
- GH-36688 - [C#] Fix dereference error (#36691)
- GH-36692 - [CI][Packaging] Pin gemfury to 0.12.0 due to issue with faraday dependency (#36693)
- GH-36708 - [C++] Fully calculate null-counts so the REE allocations make sense (#36740)
- GH-36712 - [CI] Also update issue components when it’s updated (#36723)
- GH-36720 - [R] stringr modifier functions cannot be called with namespace prefix (#36758)
- GH-36726 - [R] calling read_parquet on S3 connections results in error message being ignored (#37024)
- GH-36730 - [Python] Add support for Cython 3.0.0 (#37097)
- GH-36771 - [R] stringr helper functions drop calling environment when evaluating (#36784)
- GH-36776 - [C++] Make ListArray::FromArrays() handle sliced offsets Arrays containing nulls (#36780)
- GH-36787 - [R] lintr update leads to failing tests on main (#36788)
- GH-36809 - [Python] MapScalar.as_py with custom field name (#36830)
- GH-36819 - [R] Use RunWithCapturedR for reading Parquet files (#37274)
- GH-36828 - [C++][Parquet] Make buffered RowGroupSerializer using BufferedPageWriter (#36829)
- GH-36850 - [Go] Arrow Concatenate fix, ensure allocations are Free’d (#36854)
- GH-36856 - [C++] Remove needless braces from BasicDecimal256FromLE() arguments (#36987)
- GH-36858 - [Go] Fix dictionary builder leak (#36859)
- GH-36860 - [C++] Report CMake error when system Protobuf exists but system gRPC doesn’t exist (#36904)
- GH-36863 - [C#] Remove unnecessary applied fix to not shutdown PythonEngine on CDataInterfacePythonTests if .NET is > 5.0 (#36872)
- GH-36863 - [C#][Packaging] Do not shutdown PythonEngine on CDataInterfacePythonTests if .NET is > 5.0 (#36868)
- GH-36883 - [R] Remove version number which triggers CRAN warning (#36884)
- GH-36920 - [Java][Docs] Add ARROW_JSON var to maven build profile (#36921)
- GH-36922 - [CI][C++][Windows] Search OpenSSL from PATH (#36923)
- GH-36935 - [Go] Fix Timestamp to Time dates (#36964)
- GH-36939 - [C++][Parquet] Direct put of BooleanArray is incorrect when called several times (#36972)
- GH-36941 - [CI][Docs] Use system Protobuf (#36943)
- GH-36949 - [C++] Fix KeyColumnArray’s buffers array bounds assertion. (#36966)
- GH-36973 - [CI][Python] Archery linter integrated with flake8==6.1.0 (#36976)
- GH-36975 - [C++][FlightRPC] Skip unknown fields, don’t crash (#36979)
- GH-36981 - [Go] Fix ipc reader leak (#36982)
- GH-36983 - [Python] Different get_file_info behaviour between pyarrow.fs.S3FileSystem and s3fs (#37768)
- GH-36991 - [Python][Packaging] Skip tests on Win that require a tz database (#36996)
- GH-37017 - [C++] Guard unexpected uses of BMI2 instructions (#37610)
- GH-37022 - [CI][Java] Use the official Maven download URL (#37119)
- GH-37050 - [Python][Interchange protocol] Add a workaround for empty dataframes (#38037)
- GH-37056 - [Java] Fix importing an empty data array from c-data (#37531)
- GH-37067 - [C++] Install bundled GoogleTest (#37483)
- GH-37099 - [C++] Fix build of Flight-UCX (#37105)
- GH-37102 - [Go][Parquet] Encoding: Make BitWriter Reserve when ReserveBytes (#37112)
- GH-37106 - [C++] Remove overflowed integer rounding benchmarks (#37109)
- GH-37107 - [C++] Suppress an unused variable warning with GCC 7 (#37240)
- GH-37110 - [C++] Expression: SmallestTypeFor lost tz for Scalar (#37135)
- GH-37111 - [C++][Parquet] Dataset: Fixing Schema Cast (#37793)
- GH-37116 - [C++][ORC] Link to absl::log_internal_check_op for ABSL_DCHECK*() (#37117)
- GH-37120 - [CI][Docs] Ensure removing existing Node.js (#37121)
- GH-37129 - [CI][Docs] Use Ubuntu 22.04 (#37132)
- GH-37129 - [CI][Docs] Free up disk space (#37131)
- GH-37148 - [C++] Explicitly list the integer values of the Type::type enum (#37149)
- GH-37173 - [C++][Go][Format] C-export/import Run-End Encoded Arrays (#37174)
- GH-37208 - [R] Use currrently running R binary to compile test program (nix install) (#37225)
- GH-37213 - [C#] Updating a reference to FlatBuffers missed due to rebase/merge conflict (#37214)
- GH-37217 - [Python] Add missing docstrings to Cython (#37218)
- GH-37239 - [Ruby] Updated documentation for ArrowTable#initialize to clarify argument details (#37261)
- GH-37245 - [MATLAB]
arrow.internal.proxy.validate
throwsMATLAB:UndefinedFunction
when crafting the message to display when throwing thearrow:proxy:ProxyNameMismatch
error (#37248) - GH-37266 - [CI][C++] Use ARROW_CMAKE_ARGS not CMAKE_ARGS (#37272)
- GH-37276 - [C++] Skip multithread tests on single thread env (#37327)
- GH-37294 - [C++] Use std::string for HasSubstr matcher (#37314)
- GH-37299 - [C++] Fix clang-format version mismatch error with Homebrew’s clang-format (#37300)
- GH-37303 - [Python] Update test_option_class_equality due to CumulativeSumOptions refactor (#37305)
- GH-37308 - [C++][Docs] Change name for CPP tutorial and minor fixes to the job (#37311)
- GH-37325 - [R] Update NEWS.md with missing changes for 13.0.0 (#37326)
- GH-37329 - [Release][Homebrew] Follow directory structure change (#37349)
- GH-37340 - [MATLAB] The
column(index)
method ofarrow.tabular.RecordBatch
errors ifindex
refers to anarrow.array.Time32Array
column (#37347) - GH-37352 - [C++] Don’t put all dependencies to ArrowConfig.cmake/arrow.pc (#37399)
- GH-37373 - [CI] Make integration build a bit leaner (#37366)
- GH-37373 - [CI][Integration] Free up disk space (#37374)
- GH-37377 - [C#] Throw OverflowException on overflow in TimestampArray.ConvertTo() (#37388)
- GH-37386 - [R] CRAN failures due to “invalid non-character version specification” (#37387)
- GH-37406 - [C++][FlightSQL] Add missing ArrowFlight::arrow_flight_{shared,static} dependencies (#37407)
- GH-37408 - [C++] Install arrow-compute.pc only when ARROW_COMPUTE=ON (#37409)
- GH-37410 - [C++][Gandiva] Add support for using LLVM shared library (#37412)
- GH-37411 - [C++][Python] Add string -> date cast kernel (fix python scalar cast) (#38038)
- GH-37414 - [Release][CI] Update references to wrong apache-arrow Homebrew formula path (#37415)
- GH-37419 - [Go][Parquet] Decimal256 support for pqarrow (#37503)
-
GH-37431 - [R] Tests failing for R versions < 4.0 because of use of base pipe ( >) in tests (#37432) - GH-37433 - [CI][Release] Increase timeout for macOS (#37530)
- GH-37437 - [C++] Fix MakeArrayOfNull for list array with large string values type (#37467)
- GH-37453 - [C++][Parquet] Performance fix for WriteBatch (#37454)
- GH-37456 - [R] CRAN incoming checks show NOTE due to internal function which isn’t documented (#37457)
- GH-37463 - [R] CRAN incoming checks fail due to test run length (#37464)
- GH-37466 - [C++][Parquet] Fix Valgrind failure in DELTA_BYTE_ARRAY decoder (#37471)
- GH-37470 - [Python][Parquet] Add missing arguments to
ParquetFileWriteOptions
(#37469) - GH-37480 - [Python] Bump pandas version that contains regression for pandas issue 50127 (#37481)
- GH-37485 - [C++][Skyhook] Don’t use deprecated BufferReader API (#37486)
- GH-37487 - [C++][Parquet] Dataset: Implement sync
ParquetFileFormat::GetReader
(#37514) - GH-37488 - [C++] Disable unity build for Azure SDK for C++ (#37489)
- GH-37500 - [CI][C++] Disable Dataset and Substrait by default (#37501)
- GH-37507 - [GLib] Don’t use implicit include directories (#37508)
- GH-37515 - [C++] Remove memory address optimization from
ChunkedArray::Equals(const std::shared_ptr<arrow::ChunkedArray>& other)
if theChunkedArray
can haveNaN
values (#37579) - GH-37523 - [C++][CI][CUDA] Don’t use newer API and add missing CUDA dependencies (#37497)
- GH-37535 - [C++][Parquet] Add missing “thrift” dependency in parquet.pc (#37603)
- GH-37539 - [C++][FlightRPC] Fix binding to IPv6 addresses (#37552)
- GH-37555 - [Python] Update get_file_info_selector to ignore base directory (#37558)
- GH-37560 - [Python][Documentation] Replacing confusing batch size from 128Ki to 128_000 (#37605)
- GH-37574 - [Python] Compatibilty with numpy 2.0 (#38040)
- GH-37576 - [R] Use
SafeCallIntoR()
to call garbage collector after a failed allocation (#37565) - GH-37601 - [C++][Parquet] Add missing GoogleMock dependency (#37602)
- GH-37608 - [C++][Gandiva] TO_DATE function supports YYYY-MM and YYYY (#37609)
- GH-37614 - [R][CI] Update CI jobs due to duckdb repo moving (#37615)
- GH-37621 - [Packaging][Conda] Sync conda recipes with feedstocks (#37624)
- GH-37639 - [CI] Fix checkout on older OSes (#37640)
- GH-37648 - [Packaging][Linux] Fix libarrow-glib-dev/arrow-glib-devel dependencies (#37714)
- GH-37650 - [Python] Check filter inputs in FilterMetaFunction (#38075)
- GH-37671 - [R] legacy timezone symlinks cause CRAN failures (#37672)
- GH-37712 - [Go][Parquet] Fix ARM64 assembly for bitmap extract bits (#37785)
- GH-37715 - [Packaging][CentOS] Use default g++ on CentOS 9 Stream (#37718)
- GH-37730 - [C#] throw OverflowException in DecimalUtility if fractionalPart is too large (#37731)
- GH-37735 - [C++][FreeBSD] Suppress a shorten-64-to-32 warning (#38004)
- GH-37738 - [Go][CI] Update Go version for verification (#37745)
- GH-37750 - [R][C++] Add compatability with IntelLLVM (#37781)
- GH-37767 - [C++][CMake] Don’t touch .git/index (#38003)
- GH-37771 - [Go][Benchmarking] Update Conbench git info (#37772)
- GH-37803 - [Python][CI] Pin setuptools_scm to fix release verification scripts (#37930)
- GH-37803 - [CI][Dev][Python] Release and merge script errors (#37819)
- GH-37805 - [CI][MATLAB] Hard-code
release
toR2023a
formatlab-actions/setup-matlab
action in MATLAB CI workflows (#37808) - GH-37813 - [R] add quoted_na argument to open_delim_dataset() (#37828)
- GH-37829 - [Java] Avoid resizing data buffer twice when appending variable length vectors (#37844)
- GH-37834 - [Gandiva] Migrate to new LLVM PassManager API (#37867)
- GH-37845 - [Go][Parquet] Check the number of logical fields instead of physical columns (#37846)
- GH-37858 - [Docs][JS] Fix check of remote URL to generate JS docs (#37870)
- GH-37893 - [Java] Move Types.proto in a subfolder (#37894)
- GH-37907 - [R] Setting rosetta variable is missing (#37961)
- GH-37927 - [CI][Dev][Archery] Badges for crossbow jobs always show `no status` even when they have failed or succeeded
- GH-37936 - [CI] Fix integration testing in rc-verify nightly builds (#37933)
- GH-37950 - [R] tests fail on R < 4.0 due to test calling data.frame() without specifying stringsAsFactors=FALSE (#37951)
- GH-37952 - [C++] Make unique->shared explicit to fix build failure on at least one compiler (#38136)
- GH-37993 - [CI] Fix conda-integration build (#37990)
- GH-37999 - [CI][Archery] Install python3-dev on ARM jobs to have access to Python.h (#38009)
- GH-38011 - [C++][Dataset] Change force close to tend to close on write (#38030)
- GH-38014 - [Python] pyarrow extension type is not converted to pandas properly in 13.0.0
- GH-38034 - [Python] DataFrame Interchange Protocol - correct dtype information for categorical columns (#38065)
- GH-38039 - [C++][Parquet] Fix segfault getting compression level for a Parquet column (#38025)
- GH-38049 - [R] Prevent
on_rosetta()
from warning (#38052) - GH-38057 - [Python][CI] Fix flaky hypothesis tests (#38058)
- GH-38059 - [Python][CI] Upgrade CUDA to 11.2.2 (#38081)
- GH-38060 - [Python][CI] Upgrade Spark versions (#38082)
- GH-38068 - [C++][CI] Fixing Parquet unittest
arrow_reader_writer_test.cc
compile (#38069) - GH-38074 - [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join (#38147)
- GH-38076 - [Java][CI][Java-Jars][MacOS] C++ libraries for MacOS AARCH 64
- GH-38077 - [C++] Output bundled GoogleTest to ${BUILD_DIR}/${CONFIG} (#38132)
- GH-38084 - [R] Do not memory map when explicitly checking for file removal (#38085)
- GH-38193 - [CI][Java] Free up disk space for “AMD64 manylinux2014 Java JNI” (#38194)
- GH-38197 - [R] Update actions that used setup-r@v1 to use setup-r@v2 (#38218)
- GH-38200 - [CI][Release][Go] Ensure removing all module caches (#38222)
- GH-38201 - [CI][Packaging] Pin zlib 1.2.13 when using thrift on conan (#38202)
- GH-38206 - [CI] Remove more pre-installed files (#38233)
- GH-38226 - [R] Remove R 3.5 from test-r-versions (#38230)
- GH-38227 - [R] Fix non-unicode character errors in nightly builds (#38232)
- GH-38228 - [R] Fence examples that need dataset with
examplesIf
(#38229) - GH-38239 - [CI][Python] Disable -W error on Python CI jobs temporarily (#38238)
- GH-38263 - [C++] : Prefer to call string_view::data() instead of begin() where a char pointer is expected (#38265)
- GH-38282 - [C++] : Implement ReplaceString with the right type signature (#38283)
- GH-38286 - [CI][R] Clean GitHub runner disk for ubuntu-r-only-r images (#38287)
- GH-38293 - [R] Fix non-deterministic duckdb test (#38294)
- GH-38295 - [CI][R] Free up disk space for Azure Pipelines jobs (#38302)
- GH-38332 - [CI][Release] Resolve symlinks in RAT lint (#38337)
New Features and Improvements
- GH-20086 - [C++] Cast between fixed size and variable size lists (#37292)
- GH-21815 - [JS] Add support for Duration type (#37341)
- GH-24868 - [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType (#37166)
- GH-25659 - [Java] Add DefaultVectorComparators for Large types (#37887)
- GH-29184 - [R] Read CSV with comma as decimal mark (#38002)
- GH-29238 - [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API (#34616)
- GH-29847 - [C++] Build with Azure SDK for C++ (#36835)
- GH-32863 - [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer (#14341)
- GH-33032 - [C#] Support fixed-size lists (#35716)
- GH-33749 - [Ruby] Add Arrow::RecordBatch#each_raw_record (#37137)
- GH-33985 - [C++] Add substrait serialization/deserialization for expressions (#34834)
- GH-34031 - [Python] Use PyCapsule for communicating C Data Interface pointers at the Python level
- GH-34105 - [R] Provide extra output for failed builds (#37727)
- GH-34213 - [C++] Use recursive calls without a delimiter if the user is doing a recursive GetFileInfo (#35440)
- GH-34252 - [Java] Support ScannerBuilder::Project or ScannerBuilder::Filter as a Substrait proto extended expression (#35570)
- GH-34588 - [C++][Python] Add a MetaFunction for “dictionary_decode” (#35356)
- GH-34620 - [C#] Support DateOnly and TimeOnly on .NET 6.0+ (#36125)
- GH-34950 - [C++][Parquet] Support encryption for page index (#36574)
- GH-35116 - [CI][C++] Enable compile-time AVX2 on some CI platforms (#36662)
- GH-35176 - [C++] Add support for disabling threading for emscripten (#35672)
- GH-35243 - [C#] Implement MapType (#37885)
- GH-35273 - [C++] Add integer round kernels (#36289)
- GH-35287 - [C++][Parquet] Add CodecOptions to customize the compression parameter (#35886)
- GH-35296 - [Go] Add arrow.Table.String() (#35580)
- GH-35409 - [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 (#35312)
- GH-35531 - [Python] C Data Interface PyCapsule Protocol (#37797)
- GH-35600 - [Python] Allow setting path to timezone db through python API (#37436)
- GH-35623 - [C++][Python] FixedShapeTensorType.ToString() should print the type’s parameters (#36496)
- GH-35627 - [Format][Integration] Add string-view to arrow format (#37526)
- GH-35698 - [C#] Update FlatBuffers (#35699)
- GH-35740 - Add documentation for list arrays’ values property (#35865)
- GH-35775 - [Go][Parquet] Allow key value file metadata to be written after writing row groups (#37786)
- GH-35903 - [C++] Skeleton for Azure Blob Storage filesystem implementation (#35701)
- GH-35916 - [Java][arrow-jdbc] Add extra fields to JdbcFieldInfo (#37123)
- GH-35934 - [C++][Parquet] PageIndex Read benchmark (#36702)
- GH-36078 - [C#] Flight SQL implementation for C# (#36079)
- GH-36103 - [C++] Initial device sync API (#37040)
- GH-36111 - [C++] Refactor dict_internal.h to use Result (#37754)
- GH-36124 - [C++] Export compile_commands.json by default (#37426)
- GH-36155 - [C++][Go][Java][FlightRPC] Add support for long-running queries (#36946)
- GH-36187 - [C++] Display the name of the problematic field when returning status “Data type … is not supported in join non-key field” for HashJoin (#36539)
- GH-36199 - [Python][CI][Spark] Update spark versions used on our nightly tests (#36347)
- GH-36240 - [Python] Refactor CumulativeSumOptions to a separate class for independent deprecation (#36977)
- GH-36247 - [R] Add write_csv_dataset (#36436)
- GH-36326 - [C++] Remove APIs deprecated in v9.0 or earlier (#36675)
- GH-36363 - [MATLAB] Create proxy classes for the DataType class hierarchy (#36419)
- GH-36417 - [C++] Add Buffer::data_as, Buffer::mutable_data_as (#36418)
- GH-36420 - [C++] Add An Enum Option For SetLookup Options (#36739)
- GH-36433 - [C++] Update fast_float version to 3.10.1 (#36434)
- GH-36469 - [Java][Packaging] Distribute linux aarch64 libs with mavencentral jars (#36487)
- GH-36488 - [C++] Import/Export ArrowDeviceArray (#36489)
- GH-36511 - [C++][FlightRPC] Get rid of GRPCPP_PP_INCLUDE (#36679)
- GH-36512 - [C++][FlightRPC] Add async GetFlightInfo client call (#36517)
- GH-36546 - [Swift] The initial implementation for swift arrow flight (#36547)
- GH-36570 - [Dev] Add “Component: Swift” label to PRs (#36571)
- GH-36573 - [CI] Remove Travis CI related files and mentions (#36741)
- GH-36590 - [Docs] Support Pydata Sphinx Theme 0.14.0 (#36591)
- GH-36601 - [MATLAB] Add a MATLAB “type traits” class hierarchy (#36653)
- GH-36614 - [MATLAB] Subclass arrow::Buffer to keep MATLAB data backing arrow::Arrays alive (#36615)
- GH-36618 - [C++] Add a test for evaluation of ARROW_CHECK payload (#36617)
- GH-36621 - [C++] Add documentation for ACERO_ALIGNMENT_HANDLING (#36622)
- GH-36623 - [Go] NullType support for csv (#36624)
- GH-36642 - [Python][CI] Configure warnings as errors during pytest (#37018)
- GH-36643 - [C++][Parquet] Use nested namespace in parquet (#36647)
- GH-36652 - [MATLAB] Initialize the
Type
property ofarrow.array.Array
subclasses from existing proxy ids (#36731) - GH-36666 - [Python][CI] Re-enable skipped dask test_pandas_timestamp_overflow_pyarrow test (#38066)
- GH-36671 - [Go] BinaryMemoTable optimize allocations of GetOrInsert (#36811)
- GH-36672 - [Python][C++] Add support for vector function UDF (#36673)
- GH-36674 - [C++] Use anonymous namespace in arrow/ipc/reader.cc (#36937)
- GH-36696 - [Go] Improve the MapOf and ListOf helpers (#36697)
- GH-36698 - [Go][Parquet] Add a TimestampLogicalType creation function … (#36699)
- GH-36709 - [Python] Allow to specify use_threads=False in Table.group_by to have stable ordering (#36768)
- GH-36734 - [MATLAB] template arrow::matlab::proxy::NumericArray on ArrowType instead of CType (#36738)
- GH-36735 - Add
TimeUnit
andTimeZone
to thearrow.type.TimestampType
display (#36871) - GH-36750 - [R] Fix test-r-devdocs on MacOS (#36751)
- GH-36752 - [Python] Remove AWS SDK bundling when building wheels (#36925)
- GH-36762 - [Dev] Remove only component labels when an issue is updated (#36763)
- GH-36765 - [Python][Dataset] Change default of pre_buffer to True for reading Parquet files (#37854)
- GH-36767 - [C++][CI] Fix test failure on i386 (#36769)
- GH-36770 - [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT_URL (#36791)
- GH-36773 - [C++][Parquet] Avoid calculating prebuffer column bitmap multiple times (#36774)
- GH-36789 - [C++] Support divide(duration, duration) (#36800)
- GH-36793 - [Go] Allow NewSchemaFromStruct to skip fields if tagged with parquet:”-“ (#36794)
- GH-36795 - [C#] Implement support for dense and sparse unions (#36797)
- GH-36816 - [C#] Reduce allocations (#36817)
- GH-36824 - [C++] Improve the test tracing of CheckWithDifferentShapes in the if-else kernel tests (#36825)
- GH-36837 - [CI][RPM] Use multi-cores to install gems (#36838)
- GH-36843 - [Python][Docs] Add dict to docstring (#36842)
- GH-36845 - [C++][Python] Allow type promotion on
pa.concat_tables
(#36846) - GH-36852 - [MATLAB] Add
arrow.type.Field
class (#36855) - GH-36853 - [MATLAB] Add utility to create proxies from existing
arrow::DataType
objects (#36873) - GH-36867 - [C++] Add a struct_ and schema overload taking a vector of (name, type) pairs (#36915)
- GH-36874 - [MATLAB] Move type constructor functions from the
arrow.type
package toarrow
package (#36875) - GH-36882 - [C++][Parquet] Use RLE as BOOLEAN default encoding when both data page and version is V2 (#38163)
- GH-36882 - [C++][Parquet] Default RLE for bool values in the parquet version 2.x (#36955)
- GH-36885 - [Java][Docs] Add substrait dependency to maven build profiles (#36899)
- GH-36886 - [C++] Configure
azurite
in preparation for testing Azure C++ filesystem (#36988) - GH-36893 - [Go][Flight] Expose underlying protobuf definitions (#36895)
- GH-36905 - [C++] Add support for SparseUnion to selection functions (#36906)
- GH-36927 - [Java][Docs] Enable Gandiva build as part of Java maven commands (#36929)
- GH-36931 - [C++] Add cumulative_mean function (#36932)
- GH-36933 - [Python] Pointless ellipsis in array repr (#37168)
- GH-36936 - [Go] Make it possible to register custom functions. (#36959)
- GH-36944 - [C++] Unify OpenSSL detection for building GCS (#36945)
- GH-36950 - [C++] Change std::vector<std::shared_ptr
> to use it's alias: FieldVector (#37101) - GH-36952 - [C++][FlightRPC][Python] Add methods to send headers (#36956)
- GH-36953 - [MATLAB] Add gateway
arrow.array
function to create Arrow Arrays from MATLAB data (#36978) - GH-36961 - [MATLAB] Add
arrow.tabular.Schema
class and associatedarrow.schema
construction function (#37013) - GH-36970 - [C++][Parquet] Minor style fix for parquet metadata (#36971)
- GH-36984 - [MATLAB] Create
arrow.recordbatch
convenience constructor function (#37025) - GH-36990 - [R] Expose Parquet ReaderProperties (#36992)
- GH-36994 - [Java] Use JDK 21 in CI (#38219)
- GH-37012 - [MATLAB] Remove the private property
ArrowArrays
fromarrow.tabular.RecordBatch
(#37015) - GH-37014 - [C++][Parquet] Preserve some Parquet distinct counts when merging stats (#37016)
- GH-37021 - [Java][arrow-jdbc] Pluggable getConsumer (#37085)
- GH-37028 - [C++] Add support for duration types to if_else functions (#37064)
- GH-37041 - [MATLAB] Implement Feather V1 Reader using new MATLAB Interface APIs (#37044)
- GH-37042 - [MATLAB] Implement Feather V1 Writer using new MATLAB Interface APIs (#37043)
- GH-37045 - [MATLAB] Implement featherwrite in terms of arrow.internal.io.feather.Writer (#37047)
- GH-37046 - [MATLAB] Implement
featherread
in terms ofarrow.internal.io.feather.Reader
(#37163) - GH-37049 - [MATLAB] Update feather
Reader
andWriter
objects to work directly witharrow.tabular.RecordBatch
s instead of MATLABtable
s (#37052) - GH-37051 - [Dev][JS] Add Dependabot configuration for npm (#37053)
- GH-37073 - [Java] JDBC: Only use username/pass auth if token is not provided (#37083)
- GH-37093 - [Python] Add async Flight client with GetFlightInfo (#36986)
- GH-37096 - [MATLAB] Add utility which makes valid MATLAB table variable names from an arbitrary list of strings (#37098)
- GH-37124 - [MATLAB] Add utility functions for validating numeric and string index values (#37150)
- GH-37128 - [Java] Bump CI job from JDK 18 to JDK 20 (#37125)
- GH-37141 - [GLib][FlightRPC] Add more ArrowFlight::ClientOptions properties (#37142)
- GH-37143 - [GLib][FlightSQL] Add support for prepared INSERT (#37196)
- GH-37144 - [C++] Add RecordBatchFileReader::To{RecordBatches,Table} (#37167)
- GH-37145 - [Python] support boolean columns with bitsize 1 in from_dataframe (#37975)
- GH-37151 - [MATLAB] Use
makeValidVariableNames
andmakeValidDimensionNames
in implementation oftable
method forRecordBatch
(#37152) - GH-37155 - [MATLAB] Use
arrow.internal.validate.index.numeric()
in thecolumn()
method ofarrow.tabular.RecordBatch
(#37156) - GH-37157 - [MATLAB] Use
arrow.internal.validate.index.numericOrString()
in thefield()
method ofarrow.tabular.Schema
(#37162) - GH-37160 - [MATLAB]
arrow.internal.validate.index.string()
should not error if given a string with zero characters (#37161) - GH-37170 - [C++] Support schema rewriting of RecordBatch. (#37171)
- GH-37175 - [MATLAB] Support creating
arrow.tabular.RecordBatch
instances from a list ofarrow.array.Array
values (#37176) - GH-37179 - [MATLAB] Add a test utility that creates a MATLAB
table
containing all supported types (#37191) - GH-37181 - [MATLAB] Remove outdated test class
tArrowCppCall.m
(#37185) - GH-37182 - [MATLAB] Add public
Schema
property to MATLABarrow.tabular.RecordBatch
class (#37184) - GH-37187 - [MATLAB] Re-implement
tfeathermex.m
tests in terms of new internal Feather Reader and Writer objects (#37189) - GH-37188 - [MATLAB] Move
test/util/featherRoundTrip.m
into a packaged test utility function (#37190) - GH-37203 - [MATLAB] Remove unused feather V1 MEX infrastructure and code (#37204)
- GH-37209 - [CI][Docs][MATLAB] Remove support for
MATLAB_ARROW_INTERFACE
flag from CMake build system and build new MATLAB Interface code by default (#37211) - GH-37210 - [Docs][MATLAB] Update MATLAB
README.md
to mention support for new MATLAB APIs (e.g.RecordBatch
,Field
,Schema
, etc.) (#37215) - GH-37212 - [C++] IO: Add FromString to ::arrow::io::BufferReader (#37360)
- GH-37216 - [Docs] adding documentation to deal with unreleased allocators (#37498)
- GH-37222 - [Docs][MATLAB] Rename
arrow.recordbatch
(all lowercase) toarrow.recordBatch
(camelCase) (#37223) - GH-37228 - [MATLAB] Add C++
ARROW_MATLAB_EXPORT
symbol export macro (#37233) - GH-37229 - [MATLAB] Add
arrow.type.Date32Type
class andarrow.date32
construction function (#37348) - GH-37230 - [MATLAB] Add
arrow.type.Date64Type
class andarrow.date64
construction function (#37578) - GH-37231 - [MATLAB] Add
arrow.type.Time32Type
class andarrow.time32
construction function (#37250) - GH-37232 - [MATLAB] Add
arrow.type.Time64Type
class andarrow.time64
construction function (#37287) - GH-37234 - [MATLAB] Create an abstract
arrow.type.TemporalType
class (#37236) - GH-37237 - [C++] Set extraction time to all downloaded contents timestamp (#37238)
- GH-37244 - [Python] Remove support for pickle5 (#37644)
- GH-37246 - [Java] expose VectorAppender class to offer support to append vector values (#37247)
- GH-37251 - [MATLAB] Make
arrow.type.TemporalType
a “tag” class (#37256) - GH-37252 - [MATLAB] Add
arrow.type.DateUnit
enumeration class (#37280) - GH-37253 - [MATLAB] Add test cases which verify that the
NumFields
,BitWidth
, andID
properties can not be modified tohFixedWidth
test class (#37316) - GH-37254 - [Python] Parametrize all pickling tests to use both the pickle and cloudpickle modules (#37255)
- GH-37257 - [Ruby][FlightSQL] Use the same options for auto prepared statement close request (#37258)
- GH-37259 - [Ruby] Add explicit csv gem dependency (#37506)
- GH-37262 - [MATLAB] Add an abstract class called
arrow.type.TimeType
(#37279) - GH-37268 - [C++] adding move in some ctor in fs and dataset (#37264)
- GH-37273 - [C++] Bump vendored xxhash version (#37275)
- GH-37290 - [MATLAB] Add
arrow.array.Time32Array
class (#37315) - GH-37293 - [C++][Parquet] Encoding: Add Benchmark for DELTA_BYTE_ARRAY (#37641)
- GH-37306 - [Go] Add binary dictionary unifier (#37309)
- GH-37307 - [Python][CI] Manually skip tests with skip_with_pyarrow_strings marker for nightly dask integration tests (#37324)
- GH-37330 - [Docs][CI] Increase the Timeout for the Sphinx build (#37331)
- GH-37334 - [Packaging][Release][RPM] Don’t remove old repodata/* (#37351)
- GH-37337 - [MATLAB] Add
arrow.array.Time64Array
class (#37368) - GH-37345 - [MATLAB] Add function handle to
fromMATLAB
static construction methods toTypeTraits
classes (#37370) - GH-37364 - [C++][GPU] Add CUDA impl of Device Event/Stream (#37365)
- GH-37367 - [MATLAB] Add
arrow.array.Date32Array
class (#37445) - GH-37379 - [C++][Parquet] Thrift: Generate movable types (#37461)
- GH-37384 - [R] Set R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS = TRUE on CI (#37385)
- GH-37391 - [MATLAB] Implement the
isequal()
method onarrow.array.Array
(#37446) - GH-37392 - [JS] Remove lerna (#37393)
- GH-37394 - [C++][S3] Use AWS_SDK_VERSION_* instead of try_compile() (#37395)
- GH-37416 - [Go] Allow accessing underlying index builder of dictionary builders (#37417)
- GH-37434 - [C++] IO: Refactor BufferedInputStream::Read for small input (#37460)
- GH-37440 - [C#][Docs] Add Flight SQL supported functions to status.rst (#37441)
- GH-37447 - [C++][Docs] Document
ARROW_SUBSTRAIT
CMake flag (#37451) - GH-37448 - [MATLAB] Add
arrow.array.ChunkedArray
class (#37525) - GH-37465 - [Go] Add Value method to BooleanBuilder (#37459)
- GH-37472 - [MATLAB] Implement the
isequal()
method onarrow.type.Type
(#37474) - GH-37473 - [MATLAB] Add support for indexing
RecordBatch
columns byField
name (#37475) - GH-37477 - [MATLAB] Add
AllowNonScalar
name-value pair to arrow.internal.validate.index.* validation functions (#37482) - GH-37510 - [C++] Don’t install bundled Azure SDK for C++ (#38176)
- GH-37532 - [CI][Docs][MATLAB] Remove
GoogleTest
support from the CMake build system for the MATLAB interface (#37784) - GH-37537 - [Integration][C++] Add C Data Interface integration testing (#37769)
- GH-37553 - [Java] Allow FlightInfo#Schema to be nullable for long-running queries (#37528)
- GH-37562 - [Ruby] Add support for table.each_raw_record.to_a (#37600)
- GH-37567 - [C++] Migrate JSON Integration code to Result<> (#37573)
- GH-37568 - [MATLAB] Implement
isequal
for thearrow.tabular.Schema
MATLAB class (#37619) - GH-37569 - [MATLAB] Implement
isequal
for thearrow.type.Field
MATLAB class (#37617) - GH-37570 - [MATLAB] Implement
isequal
for thearrow.tabular.RecordBatch
MATLAB class (#37627) - GH-37571 - [MATLAB] Add
arrow.tabular.Table
MATLAB class (#37620) - GH-37572 - [MATLAB] Add
arrow.array.Date64Array
class (#37581) - GH-37584 - [Go] Add value len function to string array (#37586)
- GH-37587 - [C++] Move integration machinery into its own directory and namespace (#37588)
- GH-37591 - [MATLAB] Make
arrow.type.Type
inherit frommatlab.mixin.Heterogeneous
(#37593) - GH-37597 - [MATLAB] Add
toMATLAB
method toarrow.array.ChunkedArray
class (#37613) - GH-37628 - [MATLAB] Implement
isequal
for thearrow.tabular.Table
MATLAB class (#37629) - GH-37635 - [Format][C++][Go] Add app_metadata to FlightInfo and FlightEndpoint (#37679)
- GH-37636 - [Go] Bump minimum go versions (#37637)
- GH-37643 - [C++] Enhance arrow::Datum::ToString (#37646)
- GH-37651 - [C#] expose ArrowArrayConcatenator.Concatenate (#37652)
- GH-37653 - [MATLAB] Add
arrow.array.StructArray
MATLAB class (#37806) - GH-37654 - [MATLAB] Add
Fields
property toarrow.type.Type
MATLAB class (#37725) - GH-37670 - [C++] IO FileInterface extend from enable_shared_from_this (#37713)
- GH-37681 - [R] Update NEWS.md for 13.0.0.1 (#37682)
- GH-37687 - [Go] Don’t copy in realloc when capacity is sufficient. (#37688)
- GH-37694 - [Go] Add SetNull to array builders (#37695)
- GH-37701 - [Java] Add default comparators for more types (#37748)
- GH-37702 - [Java] Add vector validation consistent with C++ (#37942)
- GH-37703 - [Java] Method for setting exact number of records in ListVector (#37838)
- GH-37704 - [Java] Add schema IPC serialization methods (#37778)
- GH-37705 - [Java] Extra input methods for VarChar writers (#37883)
- GH-37705 - [Java] Extra input methods for binary writers (#37791)
- GH-37706 - [Java] VarCharWriter should support writing from `Text` and `String`
- GH-37722 - [Java][FlightRPC] Deprecate stateful login methods (#37833)
- GH-37724 - [MATLAB] Add
arrow.type.StructType
MATLAB class (#37749) - GH-37742 - [Python] Enable Cython 3 (#37743)
- GH-37744 - [Swift] Add test for arrow flight doGet FlightData (#37746)
- GH-37770 - [MATLAB] Add CSV
TableReader
andTableWriter
MATLAB classes (#37773) - GH-37779 - [Go] Link to the pkg.go.dev site for Go reference docs (#37780)
- GH-37782 - [C++] Add
CanReferenceFieldsByNames
method toarrow::StructArray
(#37823) - GH-37789 - [Integration][Go] Go C Data Interface integration testing (#37788)
- GH-37795 - [Java][FlightSQL] Add mock FlightSqlProducer and tests (#37837)
- GH-37799 - [C++] Compute: CommonTemporal support time32 and time64 casting (#37949)
- GH-37825 - [MATLAB] Improve
arrow.type.Field
display (#37826) - GH-37835 - [MATLAB] Improve
arrow.tabular.Schema
display (#37836) - GH-37842 - [R] Implement infer_schema.data.frame() (#37843)
- GH-37849 - [C++] Add cpp/src/*/.cmake to cmake-format targets (#37850)
- GH-37851 - [C++] IPC: ArrayLoader style enhancement (#37872)
- GH-37863 - [Java] Add typed getters for StructVector (#37916)
- GH-37864 - [Java] Remove unnecessary throws from OrcReader (#37913)
- GH-37873 - [C++][Parquet] DELTA_BYTE_ARRAY: avoid copying data when possible (#37874)
- GH-37876 - [Format] Add list-view specification to arrow format (#37877)
- GH-37880 - [CI][Python][Packaging] Add support for Python 3.12 (#37901)
- GH-37906 - [Integration][C#] Implement C Data Interface integration testing for C# (#37904)
- GH-37917 - [Parquet] Add OpenAsync for FileSource (#37918)
- GH-37923 - [R] Move macOS build system to nixlibs.R (#37684)
- GH-37934 - [Doc][Integration] Document C Data Interface testing (#37935)
- GH-37939 - [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED (#37940)
- GH-37941 - [R][CI][Release] Add checksum verification for pre-compiled binaries (#38115)
- GH-37945 - [R] Update developer documentation (#38220)
- GH-37971 - [CI][Java] Don’t use cache for nightly upload (#37980)
- GH-37978 - [C++] Add support for specifying custom Array element delimiter to
arrow::PrettyPrintOptions
(#37981) - GH-37984 - [Release] Use ISO 8601 format for YAML date value (#37985)
- GH-37994 - [R] Create wrapper functions for the CSV*Options classes (#37995)
- GH-37996 - [MATLAB] Add a static constructor method named
fromMATLAB
toarrow.array.StructArray
(#37998) - GH-38005 - [Java] disable the debug log when running Java tests (#38006)
- GH-38015 - [MATLAB] Add
arrow.buffer.Buffer
class to the MATLAB Interface (#38020) - GH-38017 - [Go][FlightSQL] Increment types handled by internal converter (#38028)
- GH-38043 - [R] Enable all features by default on macOS (#38195)
- GH-38053 - [C++][Go] Re-generate sources from Schema.fbs (#38054)
- GH-38055 - [C++] Don’t find/use Threads::Threads with ARROW_ENABLE_THREADING=OFF (#38056)
- GH-38063 - [C++] Use absolute path for external project’s ar/ranlib (#38064)
- GH-38071 - [C++][CI] Fix Overlap column chunk ranges for pre-buffer (#38073)
- GH-38088 - [R] Remove outdated references to brew and autobrew (#38089)
- GH-38138 - [R] Add curl to suggests for use of
skip_if_offline()
(#38140) - GH-38142 - [R] Add NEWS for 14.0.0 (#38143)
- GH-38145 - [Docs][Python] Add tzdata on Windows subsection in Python install docs (#38146)
- GH-38159 - [CI][Release] Run only integration tests on integration test mode (#38177)
- GH-38172 - [CI][C++] Use system GoogleTest on Ubuntu 22.04 (#38173)
- GH-38174 - [C++] Update bundled Azure SDK for C++ to 1.10.3 (#38175)
- GH-38209 - [Docs] Reduce width of header items and keep header height default (small) on smaller screens (#38148)
- GH-38240 - [Docs] version_match should match the version from versions.json (#38241)
- GH-38243 - [CI][Python] Add missing dataset marker for dataset encryption tests (#38244)
- GH-38285 - [Go] Slight deps and docs update (#38284)
- GH-38312 - [Docs] Add the Arrow C Device data interface page to the sidebar TOC (#38313)
- PARQUET-2323 - [C++] Use bitmap to store pre-buffered column chunks (#36649)