Apache Arrow 12.0.0 (2 May 2023)
This is a major release covering more than 3 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 531 commits from 97 distinct contributors.
$ git shortlog -sn apache-arrow-11.0.0..apache-arrow-12.0.0
62 Sutou Kouhei
44 Weston Pace
26 Gang Wu
26 Matt Topol
23 Nic Crane
23 mwish
22 Joris Van den Bossche
22 Raúl Cumplido
20 Alenka Frim
19 David Li
19 Felipe Oliveira Carvalho
15 Will Jones
11 Jin Shang
11 rtpsw
9 Rok Mihevc
9 Yevgeny Pats
8 Ben Harkins
7 Fokko Driesprong
7 Jacob Wujciak-Jens
7 eitsupi
6 Bryce Mecum
6 Neal Richardson
6 dependabot[bot]
5 Li Jin
4 Adam Reeve
4 Dewey Dunnington
4 Hirokazu SUZUKI
4 rtadepalli
3 Abe Tomoaki
3 Antoine Pitrou
3 Carlos O'Ryan
3 Danyaal Khan
3 Davide Pasetto
3 Diego Fernández Giraldo
3 Dominik Moritz
3 Fatemah Panahi
3 Haocheng Liu
3 Igor Izvekov
3 Patrick Hoefler
3 Sanjiban Sengupta
3 Vibhatha Lakmal Abeykoon
2 Dinir Imameev
2 Herman Schaaf
2 Min-Young Wu
2 Noah Treuhaft
2 Sven Rebhan
2 Yibo Cai
2 coldWater
1 0x26res
1 Aaron Gorenstein
1 Alexander Diemand
1 Aliaksei Makarau
1 Andrew Lamb
1 Andy Chang
1 Brett Buddin
1 Carl Boettiger
1 Chris Chua
1 Christopher Akiki
1 Curt Hagenlocher
1 Dane Pitkin
1 David Sisson
1 Dmitry Kolmakov
1 Dongjoon Hyun
1 Edward Visel
1 Hongze Zhang
1 Ian Cook
1 Igor Suhorukov
1 Jacob Marble
1 Jie Zhang
1 Jinpeng
1 Judah Rand
1 Junming Chen
1 Laurent Quérel
1 Leo Shklovskii
1 Lubo Slivka
1 Marco Edward Gorelli
1 Martin Hilton
1 Matthijs Brobbel
1 Michael Hancock
1 Michael Lui
1 NoahFournier
1 Rob Sharp
1 Sagnik Dutta
1 Shaheer Ahmad
1 Simon Perkins
1 Theodore Tsirpanis
1 Twice
1 Zaharid
1 abandy
1 cluster
1 david dali susanibar arce
1 flynn
1 gf2121
1 h-vetinari
1 lafiona
1 sunpeng
1 zagto
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-11.0.0..apache-arrow-12.0.0
123 Sutou Kouhei
77 Weston Pace
71 Matt Topol
50 Joris Van den Bossche
43 Will Jones
35 David Li
22 Jacob Wujciak-Jens
19 Nic Crane
17 Antoine Pitrou
17 Raúl Cumplido
12 Dewey Dunnington
5 Alenka Frim
5 Eric Erhardt
5 Yibo Cai
4 Rok Mihevc
3 Li Jin
3 Neal Richardson
2 Dominik Moritz
2 Micah Kornfield
1 Matthew Topol
1 dependabot[bot]
Changelog
Apache Arrow 12.0.0 (2023-04-30 07:00:00)
Bug Fixes
- GH-14779 - [C++] Compiling failed on Mac M1
- GH-14917 - [C++] Error out when GTest is compiled with a C++ standard lower than 17 (#34765)
- GH-14923 - [C++][Parquet] Fix DELTA_BINARY_PACKED problem on reading the last block with malford bit-width (#15241)
- GH-15054 - [C++] Change s3 finalization to happen after arrow threads finished, add pyarrow exit hook (#33858)
- GH-15098 - [C++] fix util::EqualityComparable to compile on clang 15 (#33940)
- GH-15102 - [C++] Could not decompress arrow stream sent from Java arrow SDK (#15194)
- GH-15109 - [Python] Allow creation of non empty struct array with zero field (#33764)
- GH-15137 - [C++][CI] Fix ASAN error in streaming JSON reader tests (#33772)
- GH-15139 - [C++] Improve bzip2 static library path detection for arrow.pc (#33712)
- GH-15173 - [C++][Parquet] Fixing ByteStreamSplit Standard broken (#34140)
- GH-15212 - [C++] fix sliced list array writing in ORC (#15213)
- GH-15247 - [R] Error when trying to save a data.frame with NULL column names (#34798)
- GH-15256 - [C++][Dataset] Add support for writing with Partitioning::Default() (#33674)
- GH-28074 - [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down (#15125)
- GH-31880 - [Python] Table.filter with expression now preserves order with use_threads=True (#34766)
- GH-31905 - [DevTools] Add linting to Cython files (#14662)
- GH-32512 - [Docs][R] Update conda install command (#34298)
- GH-32954 - [Java][FlightRPC] Remove FlightTestUtil#getStartedServer and bind to port 0 directly (#34357)
- GH-33287 - [R] Cannot read_parquet on http URL (#34708)
- GH-33336 - [C++][Parquet] Avoid UB on unaligned load (#14488)
- GH-33466 - [Go][Parquet] Add support for Dictionary arrays to pqarrow (#34342)
- GH-33501 - [Packaging][Release] Add a post-release script to add a new version to conan (#34022)
- GH-33566 - [C++] Add support for nullary and n-ary aggregate functions (#15083)
- GH-33600 - [Go][Parquet] Panic in bitmap writer (#14989)
- GH-33616 - [C++] Reorder group_by so that keys/segment keys come before aggregates (#34551)
- GH-33689 - [Python][CI] Re-enable fsspec tests on dask nightly tests (#34925)
- GH-33697 - [CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool (#33714)
- GH-33699 - [C++] Increase timeout of c++ tests when running under valgrind and shorten long tests (#33886)
- GH-33701 - [C++] Add support for LTO (link time optimization) build (#33847)
- GH-33709 - [R] Remove suffix argument from semi_join and anti_join (#34030)
- GH-33717 - [Go] Flight SQL Server handle StreamChunk errors (#33718)
- GH-33721 - [CI][R] Disable sccache on test-r-install-local macOS (#34713)
- GH-33726 - [CI][Go] Set host name in Go benchmarks (#33728)
- GH-33727 - [Python] array() errors if pandas categorical column has dictionary as string not object (#34289)
- GH-33754 - [CI] Install brewfile dependencies for verification task jobs on M1 (#33755)
- GH-33767 - [Go] Clear out parameter in ArrowArrayStream.get_next (#33768)
- GH-33777 - [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module (#33778)
- GH-33779 - [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test (#33780)
- GH-33782 - [Release] Vote email number of issues is querying JIRA and producing a wrong number (#33791)
- GH-33783 - [C#] Update release verification to use .NET 7.0 (#33799)
- GH-33786 - [C++] Ignore old system xsimd (#33811)
- GH-33796 - [C++] Fix wrong arrow-testing.pc config with system GoogleTest (#33812)
- GH-33801 - [Python] Expose C++ ExtensionTypes/ExtensionArrays in pyarrow (#33802)
- GH-33813 - [CI][GLib] Use Ruby 3.2 to update bundled MSYS2 (#33815)
- GH-33816 - [CI][Conan] Use TARGET_FILE for portability (#33817)
- GH-33820 - [CI][Release] Don’t libxsimd-dev on Ubuntu 20.04 (#33821)
- GH-33824 - [C++] Improve error message on diescovery failure (#33848)
- GH-33830 - Clarify handling of Null values in REE encoding (#33831)
- GH-33849 - [C++] Fix builds with ARROW_BUILD_SHARED=OFF and ARROW_BUILD_EXAMPLES=ON (#34350)
- GH-33864 - [Go] Don’t directly coerce cgo.Handle to unsafe.Pointer (#33865)
- GH-33876 - [C++][Windows] Use different .pc path for each config (#33907)
- GH-33882 - [C++] Don’t find .pc files with ARROW_BUILD_STATIC=OFF (#34019)
- GH-33887 - [Go] cdata package leaks handles, difficult debugging (#33889)
- GH-33904 - [R] improve behavior of s3_bucket - work-around (#34009)
- GH-33911 - [C++] Add missing std::forward to Result::ValueOrElse (#33912)
- GH-33914 - [Release] Force brew install build-from-source to not install from API (#33915)
- GH-33920 - [C++][CI] Disable Flight SQL in sanitizer job (#34014)
- GH-33932 - [Go] Fix build RecordBuilder with non-nullable items map field (#33906)
- GH-33934 - [Packaging][Linux] Enable Flight for arm64 (#34717)
- GH-33953 - [Java] Pass custom headers on every request (#33967)
- GH-33954 - [C++][Parquet] Preserve field-id for nested type (#33955)
- GH-33963 - [C++] add missing arrow/engine headers (#33964)
- GH-33970 - [C#] Make schema field names case sensitive (#33978)
- GH-33971 - [C++] Fix AdaptiveIntBuilder to always populate data buffer (#33994)
- GH-33973 - [Python][Docs] Update documentation for Parquet filter keyword (#33974)
- GH-34023 - [Docs] Version warning about viewing old docs doesn’t work for versions >= 10 (#34178)
- GH-34029 - [Docs] Add Ninja to packages to install (#34040)
- GH-34035 - [C++] Internal header file included from public one breaks build of external projects (#34036)
- GH-34037 - [Python][Docs] Fix Table.drop docstring (#34038)
- GH-34044 - [Go] Fix build with noasm tag (#34045)
- GH-34047 - [C++][FlightRPC] Make DoAction warning less prominent (#34182)
- GH-34076 - [C#] Allow schema fields with duplicate names (#34125)
- GH-34080 - [Python] Add support for round_binary to python (#34084)
- GH-34082 - [Packaging][deb] Follow Debian bookworm image change (#34091)
- GH-34086 - [C++][Parquet] Fix writing num_rows to data page v2 (#34096)
- GH-34088 - [Python] : Fix typo in get_writer (#34089)
- GH-34092 - [R] open_csv_dataset() error if schema supplied and col_names left as TRUE (the default) (#34217)
- GH-34098 - [Python][Docs] Fix dataset docstring (#34099)
- GH-34101 - [Go][Parquet] NewSchemaManifest creates wrong schema field (#34127)
- GH-34104 - [Python] update deduplicate_objects default in docs to match implementation (#34128)
- GH-34106 - [C++][Parquet] Fix updating page stats for WriteArrowDictionary (#34107)
- GH-34138 - [C++][Parquet] Fix parsing stats from min_value/max_value (#34112)
- GH-34143 - [Python][Docs] Add fill_null back to API reference (#34144)
- GH-34148 - [C++] Revert zstd back to 1.5.2 (#34190)
- GH-34150 - [C++] Fix error due to improper initialization of conversion option defaults (#34209)
- GH-34150 - [C++][Python] Fix improper initialization of ConversionOptions (#34156)
- GH-34163 - [C++][CI] Ensure using the same Zstandard with bundled ORC (#34164)
- GH-34165 - [Python] Extension array data type should default to the storage type if to_pandas_dtype is not implemented (#34559)
- GH-34175 - [Docs] Remove Jira from .github/CONTRIBUTING.md (#34205)
- GH-34188 - [C++][Benchmark] Add missing BENCHMARK_STATIC_DEFINE for bundled gbenchmark (#34194)
- GH-34191 - [C++] Ensure using the same ProtoBuf in bundled ORC (#34192)
- GH-34206 - [C++] Don’t let jemalloc defines affect unity builds (#34185)
- GH-34210 - [C++] Make casting timestamp and duration zero-copy when TimeUnit matches (#34270)
- GH-34211 - [R] Make sure Arrow arrays are unmaterialized before attempting to access the underlying ChunkedArray (#34489)
- GH-34214 - [C++] Pass OPENSSL_ROOT_HINT to CMAKE_PREFIX_PATH for bundled AWS (#34215)
- GH-34228 - [R] Add LIB_DIR when Arrow is found via pkg-config (#34229)
- GH-34230 - [Java] Call allocation listener on BaseAllocator#wrapForeignAllocation (#34231)
- GH-34238 - [C++][Python] Segfault when calling groupby on table with misaligned chunks
- GH-34241 - [C++] Fix ExecSpanIterator to properly initialize empty dictionary arrays (#34246)
- GH-34244 - [Go][FlightRPC] SQLite example report Transactions support (#34245)
- GH-34256 - [Dev] Update release scripts with main as new default branch (#34413)
- GH-34269 - [C++] Fix include file name (#34285)
- GH-34271 - [C++] Remove Thrift GitHub archive source url (#34273)
- GH-34283 - [Python] Add types_mapper support to index for to_pandas (#34445)
- GH-34284 - [Java][FlightRPC] Fixed issue with prepared statement getting sent twice (#34358)
- GH-34296 - [C++][CI] Force appveyor builds to use conda-forge and ignore defaults channel (#34297)
- GH-34301 - [CI][Packaging][RPM][arm64] Use closer.lua to download KEYS (#34302)
- GH-34303 - [CI][Packaging][deb] Use system Meson on Debian GNU/Linux bookworm (#34304)
- GH-34306 - [CI][Packaging][RPM] Don’t install utf8proc-devel on CentOS Stream 8 (#34307)
- GH-34308 - [CI][C++] Use str(“”) to reset std::stringstream for old g++ (#34317)
- GH-34309 - [C++] Disable LTO for aws_lc and s2n-tls (#34349)
- GH-34324 - [CI][C++] Specify set element type explicitly for old g++ (#34325)
- GH-34326 - [C++][Parquet] Page null_count is incorrect if stats is disabled (#34327)
- GH-34366 - [R] Don’t getFromNamespace() the dplyr:::check_name() helper (#34369)
- GH-34367 - [Java] Fix build error from sequential merges (#34368)
- GH-34381 - [Dev] Retrieve committers from arrow-site committers.yml instead of relying on author_association (#34557)
- GH-34385 - [Go] Read IPC files with compression enabled but uncompressed buffers (#34476)
- GH-34395 - [Python] Add support for symbolic linked Arrow related include directories (#34674)
- GH-34404 - [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions) (#34498)
- GH-34410 - [Python] Allow chunk sizes larger than the default to be used (#34435)
- GH-34432 - [Java] NoCompressionCodec throws for unsupported codec type (#34580)
- GH-34446 - [C++][Parquet] Fix RecordReaderPrimitveTypeTests test (#34447)
- GH-34464 - [R] Missing rlang import - inform (#34465)
- GH-34467 - [R] Disable DuckDB tests on R versions < 4.0.0 (#34468)
- GH-34472 - [Go][FlightRPC] Drain result of DoAction in Flight SQL client (#34473)
- GH-34474 - [C++] Detect and raise an error if a join will need too much key data (#35087)
- GH-34479 - [Java] java-jars failing due to conflicting slf4j bindings (#34480)
- GH-34492 - [Go] Fix missing boolean plain encoder state update (#34493)
- GH-34496 - [C++][Parquet] fix parquet unittest in
MakePages
when num_values = 0 (#34497) - GH-34513 - [CI][Python] Remove unused imports from _acero.pyx to fix linting failures (#34514)
- GH-34519 - [C++][R] Fix dataset scans that project the same name as a field (#34576)
- GH-34539 - [C++] Fix throttled scheduler to avoid stack overflow in dataset writer (#35075)
- GH-34540 - [C++] Removed set but unused variable (#34541)
- GH-34546 - [C++] Support casting from large string to string scalar (#34549)
- GH-34568 - [C++][Python] Expose Run-End Encoded arrays in Python Arrow (#34570)
- GH-34579 - [Python][Docs] TableGroupBy.aggregate options (#34759)
- GH-34597 - [Packaging][RPM] Don’t use glog (#34598)
- GH-34603 - [Go][Parquet] Problem writing dictionary with empty strings (#34709)
- GH-34605 - [C++] Don’t use std::move when passing shared_ptr to named table … (#34606)
- GH-34619 - [C++] Add extension array handling to ArraySpan conversion (#34684)
- GH-34621 - [GLib] Don’t use “g_strdup(XXX->ToString().c_str())” (#34624)
- GH-34622 - [CI][GLib] Use “meson setup …” (#34623)
- GH-34629 - [Go] Fix transpose_ints to work on riscv64-freebsd (#34647)
- GH-34633 - [C++][Parquet] Fix StreamReader to read decimals (#34720)
- GH-34639 - [C++] Support RecordBatch::FromStructArray even if struct array has nulls/offsets (#34691)
- GH-34641 - [CI][Python] Mark test_scan on test_acero.py to require dataset (#34642)
- GH-34643 - [CI] Fix files used for testing uncompressible data (#34646)
- GH-34653 - [CI][C++] Fix for arrow-dataset-file-json-test segfault on alpine-linux-cpp (#35047)
- GH-34655 - [CI][C++] arrow-compute-internals-test fails with `No function registered with name: equal` on test-cuda-cpp
- GH-34661 - [CI][C#] Update Ubuntu C# jobs to use image with .NET 7.0 (#34662)
- GH-34667 - [C++][Parquet] Test DeltaLengthByteArrayDecoder with invalid inputs (#34668)
- GH-34670 - [Packaging][C++] Add support for customizing GDB plugin install directory (#34672)
- GH-34696 - [C++] Check REE arrays have no null buffer in Validate() (#34697)
- GH-34731 - [Python] Release GIL when creating RecordBatchReader (#34732)
- GH-34743 - [Python] Relax condition in flaky Flight test (#34747)
- GH-34753 - [C++] Nightly builds failing with EnsureAlignment (#34754)
- GH-34771 - [C++] Add support for compiling on FreeBSD/amd64 (#34772)
- GH-34786 - [C++] Fix output schema calculated by Substrait consumer for AggregateRel (#34904)
- GH-34801 - [C++] Remove needless “Requires.private: libcurl openssl” from arrow.pc (#34810)
- GH-34807 - [Go] Handle
io.EOF
when reading parquet footer size and magic bytes (#34808) - GH-34823 - [C++][ORC] Fix ORC CHAR type mapping (#34836)
- GH-34831 - [C++] Check REE child buffers are valid before other checks (#34833)
- GH-34843 - [R] Fix R build failed caused by Acero refactor (#34844)
- GH-34862 - [C++] Fix ArrowDataset dependencies (#34866)
- GH-34869 - [C++] Configure alpine linux nightly job to build gtest from source (#34870)
- GH-34871 - [C++] Fixed the add_dataset_test function to properly refer to the test file (#34872)
- GH-34906 - [C++] Return invalid status instead of segfault if reading from a closed ArrayStreamBatchReader (#35016)
- GH-34933 - [Python] Raise minimum cython version (#34935)
- GH-34937 - [R] Minimal build failing due to new test which relies on snappy being installed (#34938)
- GH-34944 - [Python] Fix crash when converting non-sequence object with getitem in pa.array() (#34958)
- GH-34953 - [Ruby] Change null selection behavior in
Table.slice
to:drop
(#34954) - GH-34960 - [C++] test util Fixing arrow Random Generator for lost nullable info (#34961)
- GH-34973 - [CI][Packaging] Fix script path in wheel-clean (#34974)
- GH-34977 - [C++] Fix “Requires” format in arrow-dataset.pc (#34978)
- GH-34983 - [C++] Preserve map values nullability on C Data Interface import (#35013)
- GH-34988 - [C#] Fix Windows-specific test issue in CDataSchemaPythonTest (#34989)
- GH-34995 - [C++] Improve available GTest check for SYSTEM case (#34997)
- GH-35008 - [C++] Add printers for REETestData and PageIndexReaderParam to placate Valgrind (#35011)
- GH-35014 - [Python] Make sure unit tests can run without acero (#35017)
- GH-35018 - [CI][Java][C++] Use ARROW_ZSTD_USE_SHARED=OFF for LLVM (#35023)
- GH-35021 - [Python][CI] Use conda’s gdb in test-conda-python (#35024)
- GH-35029 - [CI][C#] Install python on ubuntu-csharp image to fix nuget CI build (#35030)
- GH-35038 - [R] argument order in arrow_table affects object return type (#35039)
- GH-35056 - [Python][CI] Don’t install gdb on Windows (#35057)
- GH-35060 - [C#][CI] Update dotnet download link regex (#35061)
- GH-35062 - [Go][CI] Fix verification failures (#35077)
- GH-35063 - [CI] Fix Python requirement in C# tests (#35091)
- GH-35066 - [CI][Packaging][Linux] Free more disk space (#35128)
- GH-35069 - [Archery][Release] Remove retrieving ARROW issue from migration comment on Archery release (#35070)
- GH-35073 - [R] Minimal build is failing (acero symbol not defined) (#35074)
- GH-35086 - [Java][CI] Upgrade CycloneDX Maven plugin version (#35092)
- GH-35089 - [CI][C++][Flight] Test failures in macos release verification nightlies (#35090)
- GH-35115 - [C++] Moved util_avx2.cc from acero to compute (#35117)
- GH-35133 - [Go] fix for
math.MaxUint32 overflows int
error in 32-bit arch (#35159) - GH-35143 - [R][C++] Fixed shape tensor causes broken build on OSX (#35154)
- GH-35170 - [CI][Packaging][Conan] Build grpc-proto (#35203)
- GH-35181 - [R] Bump R package version number in versions.json (#35132)
- GH-35186 - [CI][C++] Improve GoogleTest detection on Windows + vcpkg (#35200)
- GH-35187 - [CI][C++] Use the latest arrow-testing (#35227)
- GH-35192 - [Docs] Switch from
logo
tologo_url
to support sphinx >= 6 (#35194) - GH-35205 - [C++][Gandiva] Don’t find system Zstandard when we use bundled one (#35220)
- GH-35206 - [C++] Look for Conda OpenSSL in Windows verification (#35225)
- GH-35235 - [CI][Python] Pandas upstream_devel and nightlies are failing (#35248)
- GH-35252 - [C++] Use FindGTestAlt.cmake by ArrowTesting (#35253)
New Features and Improvements
- GH-14863 - [C++] Add appender functions to array builders that can take optionals (#24372)
- GH-14866 - [C++] Remove internal GroupBy implementation (#14867)
- GH-14912 - [Java] Remove usage of PlatformDependent in arrow-vector, arrow-jdbc and arrow-algorithm (#14913)
- GH-14939 - [C++] Support Table lookups in FieldRef and FieldPath (#34537)
- GH-15059 - [C++][Acero] populate guarantee columns from expression intstead of fragment (#15129)
- GH-15070 - [Python][CI] Update pandas test for empty columns dtype change in pandas 2.0.1 (#35031)
- GH-15070 - [Python][CI] Compatibility with pandas 2.0 (#34878)
- GH-15107 - [C++][Parquet] Parquet Encoder: Support RLE for Boolean (#34526)
- GH-15164 - [C++][Parquet] Implement current version of BloomFilter spec (#33776)
- GH-15171 - [C++] Pass std::string_view by value (#33684)
- GH-15193 - [C++][Parquet] Parquet FuzzReader add some fixed batch size (#33942)
- GH-15195 - [C++][FlightRPC][Python] Add ToString/Equals for Flight types (#15196)
- GH-15203 - [Java] Implement writing compressed files (#15223)
- GH-15209 - [C++][Gandiva] Add abs function (#15208)
- GH-15231 - [C++][Benchmarking] Add new memory pool metrics and track in benchmarks (#33731)
- GH-15280 - [C++][Python][GLib] add libarrow_acero containing everything previously in compute/exec (#34711)
- GH-15280 - [C++] Refactor to reorganize dependencies as a prequel to moving acero out of libarrow (#34518)
- GH-15284 - [C++] Use DeclarationToExecBatches in Acero plan tests (#15288)
- GH-15285 - [GLib] Add GArrowMatchSubstringOptions (#34725)
- GH-15286 - [GLib] Add GArrowIndexOptions (#34679)
- GH-15287 - [Ruby] Merge column and add suffix in Table#join (#33654)
- GH-15483 - [C++] Add a Fixed Shape Tensor canonical ExtensionType (#8510)
- GH-18481 - [C++] prefer casting literal over casting field ref (#15180)
- GH-18487 - [R] Read Text (CSV/JSON) from character vector (#33968)
- GH-18818 - [R] Create a field ref to a field in a struct (#19706)
- GH-20117 - [Dev] Ask INFRA to switch default branch to main
- GH-20272 - [C++] Bump version of bundled AWS SDK (#33808)
- GH-20351 - [C++] Kernel input type matcher for run-end encoded types (#34503)
- GH-20407 - [Go] Array Builder for REE arrays (#14114)
- GH-20408 - [Go] Implement Encode and Decode functions for REE (#34534)
- GH-20415 - [Go] Kernel Input Type for RLE (#14146)
- GH-20484 - [Swift] Initial Arrow implementation (#14561)
- GH-21429 - [GLib] Add GArrowDenseUnionArrayBuilder (#34981)
- GH-21430 - [GLib] GArrowSparseUnionArrayBuilder (#34992)
- GH-25163 - [C#] Support half-float arrays. (#34618)
- GH-25986 - [C++] Enable external material and rotation for encryption keys (#34181)
- GH-29705 - [Python] Remove deprecated pyarrow.serialization functionality (#34926)
- GH-30774 - [Python] Remove deprecated
use_async
(#34034) - GH-31148 - [Dev] Update URLs in the repo to point to main (#34218)
- GH-31506 - [Python] Address docstrings in Streams and File Access (Factory Functions) (#33609)
- GH-31507 - [Python] Address docstrings in Streams and File Access (Stream Classes) (#33698)
- GH-31548 - [Python] Test that zoneinfo timezones are accepted during type inference (#34394)
- GH-31715 - [Python] Improving Classes and Methods Docstrings - Streams and File access
- GH-31809 - [Docs] Add instructions on how to collect the produced telemetry data (#33873)
- GH-31868 - [C++] Support concatenating extension arrays (#14463)
- GH-31910 - [C++] Add support for Substrait cast expression (#34050)
- GH-32050 - [C++] Implement Rank kernel on chunked arrays (#33846)
- GH-32104 - [C++] Add support for Run-End encoded data to Arrow (#33641)
- GH-32105 - [C++] Encode and decode Run-End Encoded vectors (#34195)
- GH-32240 - [C#] Add new Apache.Arrow.Compression package to implement IPC decompression (#33893)
- GH-32240 - [C#] Support decompression when reading an IPC stream from ReadOnlyMemory (#34108)
- GH-32240 - [C#] Support decompression of IPC format buffers (#33603)
- GH-32292 - [R][Packaging] Use binaries built on CentOS 7 for Ubuntu < 22.04 (#34048)
- GH-32338 - [C++] Add IPC support for Run-End Encoded Arrays (#34550)
- GH-32613 - [C++] Simplify IPC writer for dense unions (#33822)
- GH-32619 - [Python][Docs] Include options for PyArrow build explicitly (#34463)
- GH-32653 - [C++] Cleanup error handling in execution engine (#15253)
- GH-32747 - [C++] Substrait To Arrow Emit feature testing (#14174)
- GH-32801 - [C++][Docs] Delete outdated .md files (#33829)
- GH-32804 - [Dev] Remove “master” from default_branch property of Target class in core.py after migration to “main” as the default Git branch
- GH-32916 - [C++][Python] User-defined tabular functions (#14682)
- GH-32946 - [Go] Implement REE Array and Compare (#14111)
- GH-32947 - [Go] Implement Concatenate for REE Array (#14126)
- GH-32949 - [Go] REE Array IPC read/write (#14223)
- GH-33024 - [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer (#14293)
- GH-33115 - [C++] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1) (#14351)
- GH-33143 - [C++] Naming and doc/test changes for local_time compute kernel (#34263)
- GH-33143 - [C++] Kernel to convert timestamp with timezone to wall time (#34208)
- GH-33209 - [C++] Support for reading JSON Datasets (#33732)
- GH-33215 - [Dev] Replace hard-coded string “master” with “main” in dev/archery/archery/crossbow/core.py after default branch migration
- GH-33243 - [Plasma] Remove (#34718)
- GH-33317 - [C++] Utility method to ensure an array object meetings an alignment requirement (#14758)
- GH-33377 - [Python] Table.drop should support passing a single column (#33810)
- GH-33439 - [CI] Substrait Integration Testing (#14596)
- GH-33580 - [C++] Support emit info in Substrait extension-multi and AsOfJoin (#14799)
- GH-33588 - [Substrait] Add Substrait→Acero mapping for round operationMajor: (#33775)
- GH-33596 - [C++][Parquet] Parquet page index read support (#14964)
- GH-33621 - [Documentation][Developer Tools] Add CODEOWNERS file (#33622)
- GH-33631 - [R] Rewrite Jira ticket numbers in pkgdown documents to GitHub issue numbers (#34260)
- GH-33640 - [C++] Add backpressure to asof join node (#33648)
- GH-33652 - [C++][Parquet] Add interface total_compressed_bytes_written (#33897)
- GH-33655 - [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
- GH-33655 - [C++][Parquet] Write parquet columns in parallel (#33656)
- GH-33659 - [Developer Tools] Add definition of Breaking Change and Critical Fix (#33660)
- GH-33673 - [C++] Standardize as-of-join convention for past and future tolerance (#33676)
- GH-33679 - [JS] Update dependencies (#33680)
- GH-33681 - [JS] Update flatbuffers (#33682)
- GH-33723 - [C++] re2::RE2::RE2() result must be checked (#33806)
- GH-33724 - [Doc] Update the substrait conformance doc with the latest support (#33725)
- GH-33734 - [Go] make compatible with grpc < 1.45 (#33735)
- GH-33737 - [C++] simplify exec plan tracing (#33738)
- GH-33741 - [Python] Address docstrings in Data Types Factory Functions (#33785)
- GH-33742 - [Python] Address docstrings in Data Types classes (#34380)
- GH-33746 - [R] Update NEWS.md for 11.0.0 (#33748)
- GH-33750 - [GLib] Add garrow_table_batch_reader_set_max_chunk_size() (#34601)
- GH-33760 - [R][C++] Handle nested field refs in scanner (#33770)
- GH-33787 - [C++] Suppress unused-value warning from LinuxParseCpuFlags() on s390x (#33828)
- GH-33789 - [Go] Add Err() to RecordReader (#33792)
- GH-33794 - [Go] Add SetRecordReader to PreparedStatement (#33795)
- GH-33800 - [Packaging] Drop support for Ubuntu 18.04 (#34020)
- GH-33825 - [Python] Expose pyarrow.dataset.get_partition_keys publicly (get key/value from partition expression) (#33862)
- GH-33835 - [Doc][Release] Improvements to release guide instructions (#33836)
- GH-33840 - [Go] Improve SQLite Flight SQL Example and provide mainprog (#33841)
- GH-33850 - [C++] Allow Substrait’s default extension provider to be configured (fix) (#34075)
- GH-33850 - [C++] Allow Substrait’s default extension provider to be configured (#34042)
- GH-33851 - [C++] Update bundled boost version (#33890)
- GH-33852 - [Go] Return a catalog/schema from Flight SQL example server (#33853)
- GH-33859 - [C++][Java] Bump Apache ORC to v1.8.2 (#33860)
- GH-33867 - [Go][FlightSQL] Allow passing grpc call options to PreparedStatement methods (#33868)
- GH-33872 - [C++] Remove hacky shared_ptr construction in AppendScalar (#33866)
- GH-33874 - [Java] Ensure custom headers are included during JDBC auth handshake (#33946)
- GH-33875 - [Go] Handle writing LargeString and LargeBinary types (#33965)
- GH-33892 - [R] Map
dplyr::n()
tocount_all
kernel (#33917) - GH-33895 - [Release] Add a script to add new owner of our RubyGems (#33896)
- GH-33899 - [C++] Add NamedTapRel relation as a Substrait extension (#33909)
- GH-33901 - [Go] Add a malloc-based allocator (#33902)
- GH-33923 - [Docs] Tensor canonical extension type specification (#33925)
- GH-33924 - [Format] Fixed shape Tensor as a canonical extension type
- GH-33926 - [Python] DataFrame Interchange Protocol for pyarrow.RecordBatch (#34294)
- GH-33935 - [Go][FlightRPC] Implement Flight SQL extensions (#34039)
- GH-33936 - [Go] C Data Interface: export dummy buffer for nil buffers (#33951)
- GH-33957 - [C++] Add Rank chunked array benchmarks (#34602)
- GH-33972 - [C++] Pass in metadata to ParquetReader (#34015)
- GH-33977 - [Dev] PR Workflow automation bot (#34161)
- GH-33990 - [C++] I know NAN != NAN but shouldn’t literal(NAN) == literal(NAN)?
- GH-33993 - [Java] Let OS assign port in tests while creating Flight server (#33992)
- GH-33998 - [R] Update vignettes to reference the new open_*_dataset functions (#34710)
- GH-34003 - [C++][nodiscard] (#34006)
- GH-34004 - [C++] Add a benchmarks-maximal CMake preset (#34005)
- GH-34007 - [C++] Add an array_span_mutable interface to ExecResult (#34008)
- GH-34011 - [Doc] Ensure substrait is enabled on complete doc build (#34024)
- GH-34011 - [Python][Doc] Add pyarrow.substrait to pyarrow’s API reference docs (#34012)
- GH-34051 - [C++] GcsFileSystem lazily starts sequential reads (#34052)
- GH-34053 - [C++][Parquet] Write parquet page index (#34054)
- GH-34055 - [Go][CI] Add test run in CI that uses noasm tag (#34167)
- GH-34056 - [C++] Add Utility function to simplify converting any row-based structure into an
arrow::RecordBatchReader
or anarrow::Table
(#34057) - GH-34059 - [C++] Add a fetch node based on a batch index (#34060)
- GH-34063 - [C++] Avoid waste in
GcsFileSystem::ReadAt()
(#34065) - GH-34074 - [GLib][FlightRPC] Add support for authentication (#34090)
- GH-34077 - [Go] Implement RunEndEncoded Scalar (#34079)
- GH-34078 - [C++][Parquet] Minor API improvements for BloomFilter (#33995)
- GH-34094 - [C++] Increase Boost minimum version for clang >= 16 (#34100)
- GH-34113 - [C++][Thirdparty] Bump zstd to v1.5.4 (#34114)
- GH-34118 - [C++][Python] Make # of S3 event loop threads configurable (#34134)
- GH-34119 - [C#] operator to Schema (#34126)
- GH-34122 - [C++] Allow calling function registry functions without requiring a Substrait mapping (#34288)
- GH-34136 - [C++] Add a concept of ordering to ExecPlan (#34137)
- GH-34142 - [C++][Parquet] Fix record not to span multiple pages (#34193)
- GH-34147 - [C++][Parquet] Support crc count and checking on DICTIONARY_PAGE (#34254)
- GH-34154 - [Python] Add
is_nan
method to Array and Expression (#34184) - GH-34157 - [C++] Configure bundled AWS SDK to use aws-lc instead of OpenSSL (#34159)
- GH-34171 - [Go][Compute] Implement “Unique” kernel (#34172)
- GH-34174 - [Docs][Release] Add Twitter to post-release tasks (#34202)
- GH-34186 - [Go] Add arrow.MapOfWithMetadata to support (#34207)
- GH-34197 - [R][CI] Add previous R package versions to backwards compatibility CI jobs (#34198)
- GH-34199 - [R] Increment R package version in NEWS.md (#34200)
- GH-34219 - [Go][FlightRPC] Add Transactions to Sqlite FlightSQL example (#34220)
- GH-34242 - [C++][Parquet] Optimize comment and move for shared_ptr in parquet schema (#34243)
- GH-34248 - [Python] Expose the order_by node (#34654)
- GH-34248 - [C++] Add an order_by node (#34249)
- GH-34257 - [Docs] Update git links/branches from master to main for external projects (#34502)
- GH-34262 - [C++][ORC] Support union type (#34416)
- GH-34266 - [C++] Add a pivot_longer node (#34267)
- GH-34278 - [C++] Expose schema in named table provider (#34279)
- GH-34280 - [C++][Python] Clarify meaning of row_group_size and change default to 1Mi (#34281)
- GH-34322 - [C++][Parquet] Encoding Microbench for ByteArray (#34323)
- GH-34330 - [Go][Parquet] : Add Extension type support (#34631)
- GH-34332 - [Go][FlightRPC] Add driver for
database/sql
framework (#34331) - GH-34334 - [Go][CSV] Support list fields (#34343)
- GH-34335 - [C++][Parquet] Optimize Decoding DELTA_LENGTH_BYTE_ARRAY (#34955)
- GH-34339 - [R] Add
skip_rows_after_names
option toread_csv_arrow
’s options (#34340) - GH-34359 - [Python] Add select method to pyarrow.RecordBatch (#34360)
- GH-34361 - [C++] Fix the handling of logical nulls for types without bitmaps like Unions and Run-End Encoded (#34408)
- GH-34382 - [C++] Support more types in run_end_encode and run_end_decode functions (#34761)
- GH-34388 - [C++] Build core compute kernels unconditionally (#34295)
- GH-34398 - [R] Update NEWS.md for 11.0.0.3 (#34399)
- GH-34405 - [C++] Add support for custom names in QueryOptions. Wire this up to Substrait (#34406)
- GH-34411 - [Python] Change array constructor to accept pyarrow array (#34275)
- GH-34417 - [C++][Flight] Upgrade OpenTelemetry SemanticConventions header (#34419)
- GH-34421 - [R] Let GcsFileSystem take a path for json_credentials (#34524)
- GH-34422 - [R] Expose GcsFileSystem$options (#34477)
- GH-34425 - [GLib] Add GArrowRankOptions (#34458)
- GH-34428 - [Python][Docs] Add docsstring for
make_fragment
(#34429) - GH-34437 - [R] Use FetchNode and OrderByNode (#34685)
- GH-34440 - [Ruby] Add support for
RecordBatch{File,Stream}Reader#each
without block (#34441) - GH-34442 - [Ruby][FlightRPC] Add
ArrowFlight::RecordBatchReader#each
(#34444) - GH-34453 - [Go] Support Builders for user defined extensions (#34454)
- GH-34481 - [CI] Migrate ARM jobs from Travis to self-hosted runners (#34482)
- GH-34499 - [R] Bump version in NEWS.md following release (#34500)
- GH-34536 - [Parquet][C++] Overwrite default config for DeltaBitPackEncoder (#34632)
- GH-34543 - [CI] Self-hosted ARM workflows improvements (#34512)
- GH-34547 - [C++][ORC] Remove deprecated ORC_UNIQUE_PTR (#34548)
- GH-34552 - [C++][Parquet] Sync parquet.thrift from upstream (#34553)
- GH-34561 - [C++] Implement RunEndEncodedBuilder::AppendEmptyValues() (#34562)
- GH-34564 - [Python][C++] Update code to compile with cython 3 (#34726)
- GH-34565 - [C++] Teach dataset_writer to accept custom filename functor (#34984)
- GH-34572 - [Go][CSV] Add binary support for CSV (#34558)
- GH-34581 - [C++][Java] Bump Apache ORC to v1.8.3 (#34582)
- GH-34584 - [Go][CSV] Add extension types support (#34585)
- GH-34590 - [C++][ORC] Fix timestamp type mapping between orc and arrow (#34591)
- GH-34595 - [C++] Update google-cloud-cpp to v2.8.0 (#34707)
- GH-34615 - [CI][C++] Add CI job for basic format support without ARROW_COMPUTE (#34617)
- GH-34626 - [C++] Add ordered/segmented aggregation Substrait extension (#34627)
- GH-34630 - [C++] Second block of refactoring to move acero out of libarrow (#34575)
- GH-34638 - [C++][Docs] Add documentation for minimal build flags (#34693)
- GH-34644 - [C++] Prefer unsafe casting by default in Substrait (#34645)
- GH-34650 - [GLib] Add GArrowFilterNodeOptions (#34663)
- GH-34659 - [C++] Review the validation processes around Run-End Encoded arrays to improve the Python integration (#34628)
- GH-34665 - [Parquet][C++] Allow Reading BloomFilter (#34728)
- GH-34669 - [Packaging][Conda] Update arrow feedstock dependencies (#34652)
- GH-34673 - [C++][Parquet] Add Boolean Encoding benchmark for parquet (#34676)
- GH-34686 - [Python] Add RunEndEncodedScalar class (#34924)
- GH-34687 - [CI][Python] Create job to remove old nightly wheels from gemfury (#34705)
- GH-34692 - [Java] Expose Location.toSocketAddress (#34648)
- GH-34700 - [Packaging][RPM] Use lz4-libs instead of lz4 on AlmaLinux 8+ (#34716)
- GH-34703 - [Python] Set copy=False explicitly when creating a pandas Series (#34593)
- GH-34737 - [C#] C Data interface for schemas and types (#34133)
- GH-34742 - [Java] Split flight-sql-jdbc-driver to facilitate reuse (#34678)
- GH-34768 - [C++][Gandiva] Remove LLVM<16 pin (#34922)
- GH-34768 - [C++][Gandiva] Accept LLVM 16 (#34916)
- GH-34778 - [Java] Only apply ServerInterceptorAdapter logic to Flight service requests (#34815)
- GH-34790 - [Go] : Add array.Edits.UnifiedDiff (#34827)
- GH-34790 - [Go] : Add array.Diff() (#34806)
- GH-34796 - [C++] Add FromTensor, ToTensor and strides methods to FixedShapeTensorArray (#34797)
- GH-34802 - [C++][Parquet] Allow passing pool to decoder (#34803)
- GH-34805 - [CI][Python] Cython test is failing in conda packaging builds
- GH-34812 - [Packaging][Python] Use self-hosted arm64 Linux runner instead of Travis CI for Linux arm64 wheels (#34835)
- GH-34813 - [C++] Improve GoogleTest detection (#34920)
- GH-34819 - [Ruby] Add Slicer::ColumnCondition#match_substring (#34902)
- GH-34821 - [DOC][ORC] Update documentation for ORC (#34822)
- GH-34832 - [Go] Add Record SetColumn method (#34794)
- GH-34837 - [GLib][Ruby] Add Arrow::{Sparse,Dense}UnionArray#get_value (#34838)
- GH-34839 - [Go] Build compute without noasm for non-amd64 GOARCH (#34840)
- GH-34853 - [Go] Add TotalRecordSize, TotalArraySize (#34854)
- GH-34855 - [Go] Add GetValue function to Metadata (#34856)
- GH-34863 - [Go] Pow method for Decimal DataTypes (#34864)
- GH-34879 - [Python][CI] Nightly integration tests with latest dask are failing (test_null_partition_pyarrow)
- GH-34880 - [Python][CI] Fix Windows tests failing with latest pandas 2.0 (#34881)
- GH-34882 - [Python] Binding for FixedShapeTensorType (#34883)
- GH-34888 - [C++][Parquet] Writer supports adding extra kv meta (#34889)
- GH-34893 - [C++] Fix run-end encoded array iterator issues that manifest on backwards iteration (#34896)
- GH-34899 - [C++] Dependency: bump zstd to v1.5.5 (#34900)
- GH-34914 - [Packaging][Linux] Add support for Acero (#34915)
- GH-34945 - [C++][Docs] Add missing cmake_minimum_required() to example (#34969)
- GH-34946 - [Ruby] Remove DictionaryArrayBuilder related omissions (#34947)
- GH-34951 - [Ruby] Add methods using MatchSubStringFamilyCondition (#34952)
- GH-34956 - [Docs][Python] Add to docs the usage of the FixedShapeTensorType (#34957)
- GH-34962 - [Go] Make GetOneForMarshal public on Array interface (#34964)
- GH-34968 - [C++] Add Equal Options to RecordBatch (#34970)
- GH-35025 - [Python] Remove use of deprecated pandas.Categorical fastpath keyword (#35026)
- GH-35042 - [Go][FlightSQL driver] Add TLS configuration (#35051)
- GH-35078 - [Python][CI] Tests on windows are running very slow
- GH-35218 - [R] Update NEWS for the R component/version 12.0.0 (#35219)
- PARQUET-2201 - [parquet-cpp] Add stress test for RecordReader ReadRecords and SkipRecords. (#14879)
- PARQUET-2225 - [C++][Parquet] Allow reading dense with RecordReader (#17877)
- PARQUET-2232 - [C++] Add an api to ColumnChunkMetaData to indicate if the column chunk uses a bloom filter (#33736)
- PARQUET-2250 - [C++][Parquet] Expose column descriptor through RecordReader (#34318)