Apache Arrow 10.0.0 (26 October 2022)
This is a major release covering more than 2 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 536 commits from 100 distinct contributors.
$ git shortlog -sn apache-arrow-9.0.0..apache-arrow-10.0.0
68 Sutou Kouhei
52 Matt Topol
32 David Li
31 Antoine Pitrou
19 Alenka Frim
19 Jacob Wujciak-Jens
19 Weston Pace
18 Miles Granger
18 Nic Crane
17 Jin Shang
17 Raúl Cumplido
14 Neal Richardson
14 eitsupi
12 Will Jones
12 david dali susanibar arce
11 Dewey Dunnington
10 Vibhatha Lakmal Abeykoon
7 Igor Suhorukov
7 Larry White
7 Rok Mihevc
6 rtpsw
5 Kshiteej K
5 octalene
4 Krisztián Szűcs
4 Yibo Cai
3 Ben Harkins
3 Bryce Mecum
3 Dominik Moritz
3 George Godik
3 Joris Van den Bossche
3 LouisClt
3 Percy Camilo Triveño Aucahuasi
3 Philipp Moritz
3 Todd Farmer
3 Wes McKinney
2 0x26res
2 Anja Kefala
2 Dragoș Moldovan-Grünfeld
2 François Michonneau
2 Gang Wu
2 Hongze Zhang
2 Joost Hoozemans
2 Kae S
2 Nishanth Thimmegowda
2 Pavel Solodovnikov
2 SHIMA Tatsuya
2 Sam Albers
2 Ziheng Wang
1 Aleksei Smirnov
1 Andrea Giudiceandrea
1 Ankit Gehlot
1 Artavazd Balaian
1 Benson Muite
1 Christopher Dunderdale
1 Corey Kosak
1 Dhruv Vats
1 Duncan MacQuarrie
1 Egill Fridgeirsson
1 Eng Zer Jun
1 Felix Yan
1 Gajo Petrovic
1 Gil Forsyth
1 Ivan Chau
1 Jacky Lee
1 James Bourbeau
1 James Duong
1 Jayjeet Chakraborty
1 Jeroen van Straten
1 Jie Zhang
1 Jin Chengcheng
1 Kai Fricke
1 Kevin Gurney
1 Kun Liu
1 Leo Gertsenshteyn
1 Liang-Chi Hsieh
1 Michael Chirico
1 Michał Pogoda
1 Mitch
1 Muthunagappan Muthuraman
1 Otegami
1 Quang Hoang
1 Quanlong Huang
1 Raphael Taylor-Davies
1 Rasmus Johansen
1 Sanjiban Sengupta
1 Theodore Tsirpanis
1 Wilhelm Ågren
1 William Hyun
1 Xianyang Liu
1 ZMZ91
1 andreoss
1 dependabot[bot]
1 emkornfield
1 fatemehp
1 lafiona
1 mgiessing
1 michalursa
1 mopcup
1 patrick
1 serge-sans-paille
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-9.0.0..apache-arrow-10.0.0
126 Sutou Kouhei
97 Antoine Pitrou
68 David Li
59 Matt Topol
41 Neal Richardson
26 Joris Van den Bossche
25 Weston Pace
21 Nic Crane
13 Dewey Dunnington
11 Yibo Cai
10 Alenka Frim
9 Krisztián Szűcs
4 Jonathan Keane
4 Rok
2 Eric Erhardt
2 Philipp Moritz
2 Wes McKinney
1 Alessandro Molina
1 Andrew Lamb
1 Benjamin Kietzman
1 Dominik Moritz
1 Ian Cook
1 Rok Mihevc
Changelog
Apache Arrow 10.0.0 (2022-10-26)
New Features and Improvements
- ARROW-3678 - [Go] Implement Union Arrays (#13768)
- ARROW-6772 - [C++] Add operator== for interfaces with an Equals() method (#14038)
- ARROW-6858 - [C++] Simplify transitive build option dependencies (#14224)
- ARROW-7744 - [Java][FlightRPC] JDBC Driver for Arrow Flight SQL (#13800)
- ARROW-8201 - [Python] Add FileFragment.open() method (#14301)
- ARROW-8226 - [Go] Add 64-bit offset Binary Builder and String Builder (#13719)
- ARROW-10600 - [Go] Implement Decimal256 (#13792)
- ARROW-11699 - [R] Implement dplyr::across() for mutate()
- ARROW-11841 - [R][C++] Allow cancelling long-running commands (#13635)
- ARROW-12105 - [R] Replace vars_select, vars_rename with eval_select, eval_rename (#14371)
- ARROW-12590 - [C++][R] Update copies of Homebrew files to reflect recent updates (#13769)
- ARROW-12693 - [R] add unique() methods for ArrowTabular, datasets (#13641)
- ARROW-12778 - [R] Support tidyselect where() selection helper in dplyr verbs
- ARROW-12958 - [CI][Developer] Build + host the docs for PR branches (#13913)
- ARROW-13055 - [Doc] Create canonical extension types document (#14167)
- ARROW-13454 - [C++][Docs] Tables vs Record Batches (#14008)
- ARROW-13766 - [R] Add slice_*() methods (#14361)
- ARROW-14280 - [Doc] R package Architectural Overview (#14294)
- ARROW-14495 - [Python] Fix DictionaryArray.from_buffers, should not crash (#13989)
- ARROW-14500 - [C++] Support casting from storage type to extension type
- ARROW-14958 - [C++][Python][FlightRPC] Implement Flight middleware for OpenTelemetry propagation (#11920)
- ARROW-15011 - [R] Generate documentation for dplyr function bindings (#14014)
- ARROW-15260 - [R] open_dataset - add file_name as column (#12826)
- ARROW-15277 - [C++][Python] Use ChunkedArray::Make for chunked_array (#13950)
- ARROW-15479 - [C++] Cast fixed size list to compatible fixed size list type (other values type, other field name) (#14181)
- ARROW-15481 - [R][CI] Add a crossbow job that mimics CRAN’s old macOS (#13925)
- ARROW-15540 - [C++] Allow the substrait consumer to accept plans with hints and nullable literals (#14402)
- ARROW-15545 - [Python][C++] Support casting to extension type (#14106)
- ARROW-15582 - [C++] Add support for registering standard Substrait functions (#13613)
- ARROW-15584 - [C++] Add support for Substrait’s RelCommon::Emit (#13914)
- ARROW-15678 - [C++] Add support for -DCMAKE_BUILD_TYPE=MinSizeRel (#14342)
- ARROW-15693 - [Dev] Update crossbow templates to use master or main (#13975)
- ARROW-15745 - [Java] Deprecate redundant iterable of ScanTask (#14168)
- ARROW-15838 - [R] Coalesce join keys in full outer join (#14286)
- ARROW-15839 - [C++][Python] Accept validity bitmap in ListArray.from_arrays (#13894)
- ARROW-15927 - [C++][Skyhook] Add skyhook example (#12620)
- ARROW-16000 - [C++][Python] Dataset: Alternative implementation for adding transcoding function option to CSV scanner (#13820)
- ARROW-16190 - [CI][R] Implement CI on Apple M1 for R (#14099)
- ARROW-16226 - [C++] Add better coverage for filesystem tell. (#14064)
- ARROW-16340 - [C++][Python] Move all Python related code into PyArrow (#13311)
- ARROW-16356 - [Python] Expose RandomAccessFile::GetStream (#13793)
- ARROW-16384 - [Docs] Add Flight SQL to status page (#14053)
- ARROW-16424 - [C++] Use Uri to parse substrait ReadRel file path (#14071)
- ARROW-16431 - [C++][Python] Improve AppendRowGroups error when schemas differ (#14029)
- ARROW-16584 - [Java] Java JNI with S3 support (#13157)
- ARROW-16605 - [CI][R] Fix revdep docker job (#13483)
- ARROW-16690 - [R][FlightRPC] Additional max_chunksize parameter in do_put method (#13267)
- ARROW-16695 - [R][Python][C++] Extension types are not supported in joins (#13501)
- ARROW-16719 - [Python] Add path/URI + filesystem handling to parquet.read_metadata (#13629)
- ARROW-16740 - [C++] Remove IR Consumer (#13301)
- ARROW-16855 - [C++] Adding Read Relation ToProto (#13401)
- ARROW-16870 - [C++] Fix link issues with ldd and clang for flight examples (#14077)
- ARROW-16879 - [R][CI] Test R GCS bindings with testbench (#13542)
- ARROW-16894 - [C++] Add Benchmarks for Asof Join Node (#13426)
- ARROW-16949 - [Doc] Add Glossary to the New Contributor’s Guide (#13951)
- ARROW-16981 - [C++] Expose jemalloc statistics for logging (#13516)
- ARROW-16988 - [C++] Introduce Substrait ToProto/FromProto conversion options (#13537)
- ARROW-17004 - [Java] Add utility to bind Arrow data to JDBC parameters (#13589)
- ARROW-17016 - [C++][Python] Move Arrow Python C++ tests into Cython (#14117)
- ARROW-17017 - [C++][Python] Enable automate re-build of Arrow Python
- ARROW-17021 - [C++][R][CI] Enable use of sccache in crossbow (#13556)
- ARROW-17052 - [C++][Python][FlightRPC] expose flight structures serialize (#13986)
- ARROW-17079 - Show HTTP status code for unknown S3 errors (#14019)
- ARROW-17079 - [C++] Raise proper error message instead of error code for S3 errors (#14001)
- ARROW-17079 - [C++] Improve error messages for AWS S3 calls (#13979)
- ARROW-17081 - [Java][Datasets] Move JNI build configuration from cpp/ to java/ (#13911)
- ARROW-17088 - [R] Use
.arrow
as extension of IPC files of datasets (#13690) - ARROW-17089 - [Python] Use
.arrow
as extension for IPC file dataset (#13677) - ARROW-17092 - [Docs] Add note about “Feather” to the IPC file format document (#13693)
- ARROW-17106 - [Python] Move init code to core and expose only API (#13802)
- ARROW-17113 - [Java] Fail loudly in static initializer blocks (#13678)
- ARROW-17122 - [Python] Cleanup after moving Python related code into pyarrow
- ARROW-17131 - [Python] add StructType().field(): returns a field by name or index (#13652)
- ARROW-17154 - [C++] Change cmake project name from arrow_python to pyarrow_cpp
- ARROW-17160 - [C++] Create a base directory for PyArrow CPP header files (#14275)
- ARROW-17172 - [C++][Python] test_cython_api fails on windows (#14133)
- ARROW-17175 - [CI][macOS] macos-10.15 is deprecated and macos-latest is macos-11 (#13684)
- ARROW-17178 - [R] Support head() in arrow_dplyr_query with user-defined function (#13706)
- ARROW-17181 - [Docs][Python] Scalar UDF Experimental Documentation (#13687)
- ARROW-17205 - [Dev][Release] Merge script should prompt for next version when maintenance branch is created (#13708)
- ARROW-17214 - [C++] Add scalar casts to string types for list based types (#13737)
- ARROW-17219 - [Go][IPC] Endianness Conversion for Non-Native Endianness (#13716)
- ARROW-17222 - [Docs][Archery][Integration] Document the current Integration test cases covered by archery (#13717)
- ARROW-17240 - [CI][Release] Verify wheels in nightly CI (#14319)
- ARROW-17243 - [Website] Add ClickHouse to “powered by”
- ARROW-17247 - [C++][Docs] Include visibilty to ExecPlan APIs in Acero Docs (#13741)
- ARROW-17252 - [R] Intermittent valgrind failure (#13773)
- ARROW-17266 - [Doc] Java nightlies file prefix changed (#13755)
- ARROW-17269 - [Java] implemented TransferPair methods in MapVector to get correct valuevector as mapvector instead of listvector (#13776)
- ARROW-17270 - [Docs] Move nightly package instructions to dev docs (#13766)
- ARROW-17273 - [Go][CSV] Add Timestamp, Date32, Date64 format support to csv.Writer (#13772)
- ARROW-17274 - [GO] Remove panic from parquet.file.RowGroupReader.Column(index int) (#13767)
- ARROW-17275 - [Go][Integration] Handle Large offset types in IPC read/write (#13770)
- ARROW-17276 - [Go][Integration] Implement IPC handling for union type (#13806)
- ARROW-17277 - [Go][CSV] Custom csv.Writer formatter for boolean values (#13774)
- ARROW-17280 - [C++] Move vendored flatbuffers to private namespace (#13775)
- ARROW-17282 - [Python] flake8 update fails linter CI (#13778)
- ARROW-17287 - [C++] Create scan node that doesn’t rely on the merged generator (#13782)
- ARROW-17289 - [C++] Add type category membership checks (#13783)
- ARROW-17293 - [Java][CI] Prune java nightly builds (#13839)
- ARROW-17297 - [Java][Doc] Adding documentation to interact between C++ to Java via C Data Interface (#13788)
- ARROW-17299 - [C++][Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters (#13799)
- ARROW-17303 - [Java][Dataset] Read Arrow IPC files by NativeDatasetFactory (#13760) (#13811)
- ARROW-17304 - [C++][Compute] Print actual values when compare fails in aggregate test (#13814)
- ARROW-17305 - [C++] Avoid spending time in popcount in BitmapAnd benchmark (#13794)
- ARROW-17306 - [C++] Provide an optimized
GetFileInfoGenerator
specialization forLocalFileSystem
(#13796) - ARROW-17310 - [C++] Expose RBR:Make() from Iterator
(#13798) - ARROW-17317 - [Release][Docs] Normalize previous document version directory (#14457)
- ARROW-17318 - [C++][Dataset] Support async streaming interface for getting fragments in Dataset (#13804)
- ARROW-17320 - [Python] Refine pyarrow.parquet API exposure (#14096)
- ARROW-17321 - [JS] Update dependencies (#13758)
- ARROW-17322 - [Docs] Documenting issue lifecycle for bugs and feature requests (#13781)
- ARROW-17323 - [Go] Cleanup and upgrade dependencies (#13807)
- ARROW-17324 - [Go][CI] Add go1.18 job and -asan flag (#13867)
- ARROW-17326 - [Go][FlightSQL] Add FlightSQL support for Go (#13828)
- ARROW-17340 - [Go] Use
T.TempDir
to create temporary test directory (#13816) - ARROW-17348 - [C++] Add support for building bundled LZ4 with Visual C++ 2019 or later (#13817)
- ARROW-17349 - [C++] Allow casting map types (#14198)
- ARROW-17355 - [R] Refactor the handle_* utility functions for a better dev experience (#14030)
- ARROW-17357 - [CI][Conan] Enable JSON (#13823)
- ARROW-17358 - [CI][C++] Add a job for Alpine Linux (#13825)
- ARROW-17359 - [Go][FlightSQL] Create Example with SQLite in-mem and use to test FlightSQL server (#13868)
- ARROW-17362 - [R] Implement dplyr::across() inside summarise() (#14042)
- ARROW-17364 - [R] Implement .names argument inside across()
- ARROW-17366 - [R] Support purrr-style lambda functions in .fns argument to across() (#14327)
- ARROW-17367 - [C++] Fix the LZ4’s CMake target name (#13831)
- ARROW-17368 - [C++] Add support for installing utilities (#13832)
- ARROW-17370 - [C++] Add limit to SplitString() (#13833)
- ARROW-17371 - [R] Remove as.factor to dictionary_encode mapping
- ARROW-17377 - [C++][Docs] Adds tutorial for basic Arrow, file access, compute, and datasets (#13859)
- ARROW-17385 - [Integration] Re-enable Rust integration case (#13852) (#13858)
- ARROW-17385 - [Integration] Revert “Re-enable Rust integration case” (#13856)
- ARROW-17387 - [R] Implement dplyr::across() inside filter() (#14281)
- ARROW-17390 - [Go] Add union scalar types (#13860)
- ARROW-17394 - [C++][Parquet] Fix parquet_static dependencies (#13863)
- ARROW-17395 - [CI][Conan] can’t find grpc-proto/cci.20220627 package (#13864)
- ARROW-17405 - [Doc][Java] C Data Interface library able to compile with mvn command (#13881)
- ARROW-17407 - [Doc][FlightRPC] Flight/gRPC best practices (#13873)
- ARROW-17409 - [Packaging][RPM][GLib] *-glib-libs should have .typelib and *-glib-devel should have .gir (#13876)
- ARROW-17412 - [C++] AsofJoin multiple keys and types (#13880)
- ARROW-17418 - [Doc][Java] Dataset library able to compile with mvn command (#13889)
- ARROW-17420 - [C++][FlightRPC] Fix schema validation in Flight SQL integration test (#13897)
- ARROW-17427 - [Java] Add Windows build script that produces DLLs (#14203)
- ARROW-17430 - [Java] ListBinder to bind Arrow List type to DB column (#13906)
- ARROW-17431 - [Java] MapBinder to bind Arrow Map type to DB column (#13941)
- ARROW-17434 - [Java][CI] Add build Windows support for Java (#13918)
- ARROW-17435 - [CI][Python][CUDA] Install Numba for CUDA interop tests (#13899)
- ARROW-17436 - [C++] Use -O2 instead of -O3 for RELEASE builds (#13661)
- ARROW-17439 - [R] Change behavior of pull to compute instead of collect (#14330)
- ARROW-17449 - [Python] Better repr for Buffer, MemoryPool, NativeFile and Codec (#13921)
- ARROW-17451 - [CI][Java] Use manylinux2014 image for JNI (#13920)
- ARROW-17455 - [Go] Function and Kernel execution architecture (#13964)
- ARROW-17456 - [Go] Mark the compute module as a separate sub-module (#13910)
- ARROW-17460 - [R] Don’t warn if the new UDF I’m registering is the same as the existing one (#14436)
- ARROW-17463 - [R] Avoid unnecessary projections (#13954)
- ARROW-17470 - [CI][GLib] Add more system packages to sync the upstream PKGBUILD (#13917)
- ARROW-17475 - [Go] Function interface and Registry impl (#13924)
- ARROW-17476 - [Release][Packaging] Make binary uploader reusable from datafusion-c (#13923)
- ARROW-17479 - [Go] Add ArraySpan and utilities (#13929)
- ARROW-17480 - [Java] add setNull() to FieldVector interface (#14244)
- ARROW-17482 - [Go] Remove ValueDescr types (#13930)
- ARROW-17483 - [Python] Support Expression filters in non-legacy ParquetDataset/read_table (#14011)
- ARROW-17485 - [R] Allow TRUE/FALSE to the compression option of
write_feather
(write_ipc_file
) (#13935) - ARROW-17488 - [Python] Add support for RelWithDebInfo
- ARROW-17489 - [R] Nightly builds failing due to test referencing unrelease stringr functions (#13937)
- ARROW-17492 - [C++] Hashing32/64 support for large var-binary types (#13940)
- ARROW-17499 - [Go] Shift MakeArrayOfNull to array Package (#13944)
- ARROW-17500 - [Go] Kernel and KernelContext interfaces (#13946)
- ARROW-17510 - [CI][C++][Windows][MSVC] Use ccache (#13957)
- ARROW-17511 - [C++] Add support for xsimd 9.0.0 (#13958)
- ARROW-17512 - [Doc] Updates to crossbow documentation for clarity (#13993)
- ARROW-17519 - [R] RTools35 job is failing (#14035)
- ARROW-17521 - [Python] Add python bindings for NamedTableProvider for Substrait consumer (#14024)
- ARROW-17523 - [C++] Add support to substrait function is_null, is_not_null and count (#13969)
- ARROW-17525 - [Java] Read ORC files using NativeDatasetFactory (#13973)
- ARROW-17527 - [Go] Implement Cast to Boolean Functions (#13974)
- ARROW-17532 - [Go][Compute] Implement Numeric Cast functions (#13992)
- ARROW-17536 - [Packaging][RPM][Gandiva] Fix build error on CentOS Stream 9 (#13984)
- ARROW-17545 - [C++][CI] Mandate C++17 instead of C++11 (#13991)
- ARROW-17546 - [C++] Remove pre-C++17 compatibility measures
- ARROW-17551 - [Go] Implement Temporal Cast Functions (#14006)
- ARROW-17553 - [Go] Enable flight.Server to register additional grpc services (#13995)
- ARROW-17554 - [Python][Packaging] Stop producing macOS Mavericks wheels (#13996)
- ARROW-17555 - [Dev][CI] “ci/scripts/install_osx_sdk.sh” unused
- ARROW-17560 - [Java][Gandiva] Move JNI build configuration from cpp/ to java/ (#14159)
- ARROW-17561 - [Java][ORC] Move JNI build configuration from cpp/ to java/ (#14162)
- ARROW-17569 - [C++] Bump xsimd version to 9.0.1 (#14005)
- ARROW-17575 - [Docs][C++] Update build document to follow new CMake package (#14097)
- ARROW-17585 - [Java] Update GenerateSampleData.java (#14289)
- ARROW-17586 - [Go] String To Numeric cast functions (#14015)
- ARROW-17587 - [Go] Cast From Extension Types (#14016)
- ARROW-17588 - [Go] Casting to binary-like types (#14027)
- ARROW-17594 - [R][Packaging] Build binaries with devtoolset 8 on CentOS 7 (#14243)
- ARROW-17600 - [Go] Implement Casting for Nested types (#14056)
- ARROW-17603 - [C++][FlightRPC] Be verbose about failures when REQUIRE_TLSCREDENTIALSOPTIONS is on (#14034)
- ARROW-17604 - [Docs][Java] Make it more obvious that –add-opens is required (#14066)
- ARROW-17617 - [Docs] Remove experimental qualifier from Flight (#14055)
- ARROW-17621 - [CI] Audit workflows (#14155)
- ARROW-17628 - [CI][Packaging][Java] Publish latest nightly with SNAPSHOT version (#14135)
- ARROW-17629 - [Java] Bind DB column to Arrow Map type in JdbcToArrowUtils (#14134)
- ARROW-17630 - [Java] Introduce column index in JdbcToArrowTypeConverter as JdbcFieldInfo.column
- ARROW-17631 - [Java] Propagate table/columns comments into Arrow Schema (#14081)
- ARROW-17632 - [Python][C++] Add details of where libarrow is being found during build (#14059)
- ARROW-17638 - [Go] Extend C Data API support for Union arrays and RecordReader interface (#14057)
- ARROW-17646 - [Go][CI] Switch C Data to use cgo.Handle (bumps to Go1.17) (#14067)
- ARROW-17647 - [C++] Using better namespace style when using protobuf with Substrait (#14121)
- ARROW-17649 - [Python] Remove remaining deprecated APIs from <= 1.0.0 (#14401)
- ARROW-17659 - [Java] Populate JDBC schema name metadata when config.shouldIncludeMetadata provided (#14196)
- ARROW-17665 - [R] Document dplyr and compute functionality (#14387)
- ARROW-17666 - [R] Document exceptions to dplyr verb support
- ARROW-17667 - [R] Document exceptions to function binding support
- ARROW-17669 - [Go] Take Function kernels for Record batch, Tables and Chunked Arrays (#14214)
- ARROW-17670 - [Go] Implement Filter function for Primitive and FixedSize types (#14088)
- ARROW-17671 - [Go] Filter kernels for Binary/String (#14098)
- ARROW-17673 - [R]
desc
indplyr::arrange
should allowdplyr::
prefix (#14090) - ARROW-17674 - [R] Implement dplyr::across() inside arrange() (#14092)
- ARROW-17677 - [Go] Filter functions for list and extension types (#14141)
- ARROW-17678 - [Go] Filter kernels for Record Batches and Tables (#14156)
- ARROW-17688 - [C++][Java][FlightRPC] Substrait, transaction, cancellation for Flight SQL (#13492)
- ARROW-17689 - [R] Implement dplyr::across() inside group_by() (#14122)
- ARROW-17690 - [R] Implement dplyr::across() inside distinct() (#14154)
- ARROW-17691 - [Go] Implement Take for Primitive Types (#14101)
- ARROW-17693 - [C++] Remove string_view backport (#14177)
- ARROW-17694 - [C++] Remove std::optional backport (#14105)
- ARROW-17695 - [C++] Remove Variant class (#14136)
- ARROW-17698 - [R] Implement use of `where()` inside `across()
- ARROW-17701 - [C++][Gandiva] Add support for untyped node (#14110)
- ARROW-17704 - [Java][FlightRPC] Update to Junit 5 (#14103)
- ARROW-17716 - [Docs] Remove IR documentation page (#14112)
- ARROW-17724 - [R] Allow package name prefix inside dplyr::across’s .fns argument (#14279)
- ARROW-17730 - [Go] Implement Take kernels for FSB and VarBinary (#14127)
- ARROW-17734 - [Go] Implement Take for Lists and Dense Union (#14130)
- ARROW-17736 - [C++] Added a fallback name resolution mechanism to the Substrait producer. (#14143)
- ARROW-17741 - [Packaging] Include JDBC driver in java-jars artifacts (#14139)
- ARROW-17749 - [Go] Implement Filter and Take for Structs (#14145)
- ARROW-17764 - [CI][C++] “#include
" is missing (#14161) - ARROW-17767 - [Java][ORC] Move JNI build configuration from cpp/ to java/ (#14163)
- ARROW-17778 - [Go][CSV] Simple CSV Reader Schema and type inference (#14171)
- ARROW-17782 - [C++][R] R package not building on macos 10.13 with C++17 std lib (#14178)
- ARROW-17786 - [Java] Read CSV files using org.apache.arrow.dataset.jni.NativeDatasetFactory (#14182)
- ARROW-17788 - [R][Doc] Add example of using Scanner (#14184)
- ARROW-17789 - [Java][Docs] Update Java Dataset documentation with latest changes (#14382)
- ARROW-17792 - [C++] Use lambda capture move construction (#14188)
- ARROW-17794 - [Java] Force delete jni lib file on JVM exit (#14189)
- ARROW-17803 - [C++][nodiscard] (#14193)
- ARROW-17804 - [Go][CSV] Add Date32 and Time32 parsers (#14192)
- ARROW-17810 - [Java] Use jacoco-maven-plugin 0.8.8 for Java 18 support (#14197)
- ARROW-17811 - [Java][Doc] Added high-level documentation for Dictionary Encoding in Java (#14213)
- ARROW-17814 - [C++] Fix style (#14218)
- ARROW-17814 - [C++] Remove make_unique reimplementation (#14204)
- ARROW-17815 - [Python] Warn, not error out, when SetSignalStopSource fails (#14205)
- ARROW-17817 - [C++] Let ORC compile on MSVC if it is activated (#14208)
- ARROW-17823 - [C++] Revert std::make_shared change for CUDA (#14233)
- ARROW-17823 - [C++] Prefer std::make_shared/std::make_unique over constructor with new (#14216)
- ARROW-17824 - [C++][Gandiva] Implement preallocation for variable length output buffer (#14230)
- ARROW-17826 - [Python] Allow scalars when creating expression from compute kernels (#14360)
- ARROW-17834 - [Python] Allow creating ExtensionArray through pa.array(..) constructor (#14253)
- ARROW-17840 - [Java] Disable flaky JaCoCo coverage check (#14231)
- ARROW-17844 - [C++] Remove atomic shared_ptr compatibility functions (#14239)
- ARROW-17845 - [CI][Conan] Re-enable Flight in Conan CI check (#14240)
- ARROW-17846 - [C++] Use
if constexpr
in CSV subsystem (#14241) - ARROW-17847 - [C++] Support unquoted decimal in JSON parser (#14242)
- ARROW-17849 - [R][Docs] Document changes due to C++17 for centos-7 users (#14440)
- ARROW-17854 - [CI][Developer] Host preview docs on S3 (#14247)
- ARROW-17856 - [CI][Archery] Add new Archery command to delete old branches and tags on crossbow repo (#14248)
- ARROW-17857 - [C++] Fix segfault in Table::CombineChunksToBatch (#14249)
- ARROW-17860 - [Plasma] Deprecate Plasma
- ARROW-17861 - [C++] Deprecate Plasma (#14305)
- ARROW-17862 - [Plasma][GLib] Deprecate Plasma C GLib bindings (#14259)
- ARROW-17863 - [Python] Deprecate Plasma Python bindings (#14343)
- ARROW-17864 - [Plasma][Ruby] Deprecate Plasma Ruby bindings (#14258)
- ARROW-17865 - [Java] Deprecate Java Plasma JNI bindings (#14262)
- ARROW-17868 - [C++][Python] Restore the ARROW_PYTHON CMake option (#14273)
- ARROW-17872 - [C++][CI] Reduce macOS CI dependencies (#14310)
- ARROW-17875 - [C++] Remove assorted pre-C++17 compatibility measures (#14263)
- ARROW-17878 - [Website] Exclude Ballista docs from being deleted
- ARROW-17880 - [Go] Add support for Decimal128 and Decimal256 to CSV writer (#14278)
- ARROW-17882 - [Java][Doc] Adding building steps for Windows user to produce JNI DLL (#14379)
- ARROW-17883 - [Java] implement immutable table (#14316)
- ARROW-17888 - [Docs] Add reference of the cookbook contrib page to New Contributor’s Guide (#14283)
- ARROW-17889 - [CI] Remove Kartothek integration tests (#14274)
- ARROW-17891 - [Docs][Python] Update and sync Win section of the developers/python page (#14350)
- ARROW-17903 - [JS] Update dependencies (#14285)
- ARROW-17911 - [R] Implement
across()
withintransmute()
(#14290) - ARROW-17924 - [Doc][Format] Clarify immutability assumption in C Data Interface (#14304)
- ARROW-17929 - [C#] Improve the NuGet packages. (#14312)
- ARROW-17934 - [R] Use tempfile instead of working directory for dataset test (#14315)
- ARROW-17936 - [R] ExecPlanReader test aborts with a crash
- ARROW-17939 - [Docs][Python] Update python dev page after PyArrow C++ tests change (#14322)
- ARROW-17940 - [Java][Gandiva] Implement Reserve for JavaBuffer (#14323)
- ARROW-17942 - [Website] Some links can be changed from http to https
- ARROW-17944 - [Python] substrait.run_query accept bytes/Buffer and not segfault (#14331)
- ARROW-17945 - [Website][Release] Use https:// for search.maven.org (#14329)
- ARROW-17950 - [Docs][Python] Add more info about the change in PyArrow C++ API (#14333)
- ARROW-17952 - [Archery][CI] Fix archery error when running ubuntu-cuda-cpp (#14335)
- ARROW-17954 - [R] Update news for 10.0 (#14337)
- ARROW-17955 - [Docs][Java] Tutorial documentation for Table (#14344)
- ARROW-17962 - [Java] Remove unused schema creation from try with resources (#14346)
- ARROW-17965 - [C++] ExecBatch support for ChunkedArray values (#14348)
- ARROW-17969 - [CI][C++] Don’t use LLVM 14 or later on Ubuntu 18.04 (#14356)
- ARROW-17971 - [Format][Docs] Add ADBC (#14079)
- ARROW-17972 - [CI] Update CUDA docker jobs
- ARROW-17976 - [C++] Use generic lambdas in arrow/compare.cc (#14363)
- ARROW-17982 - [C++][Java] Update ORC to 1.8.0 (#14367)
- ARROW-17988 - [C++] Remove index_sequence_for and aligned_union backports (#14372)
- ARROW-17992 - [CI][C++][Conda] Remove needless clangdev/llvmdev pinnings (#14376)
- ARROW-17993 - [CI][Release] Use Node.js 16 LTS for verify-rc-source--conda- (#14377)
- ARROW-17997 - [Ruby] Add support for building Arrow::Tensor from raw nested Ruby array (#14381)
- ARROW-18010 - [Go] Add ARM64 Neon impl for Casting (#14388)
- ARROW-18017 - [Go] Simplify Compute module deps and release (#14391)
- ARROW-18019 - [C++][Gandiva] Improve Projector evaluation performance (#14394)
- ARROW-18026 - [C++][Gandiva] Add div and mod functions for unsigned ints (#14397)
- ARROW-18027 - [Dev][Archery][Crossbow] Reuse GitHub Token (#14398)
- ARROW-18028 - [Dev][Archery][Crossbow] Always use GitHub Action’s run page URL in PR comment (#14399)
- ARROW-18030 - [C++] Bump LZ4 version (#14405)
- ARROW-18044 - [Java] upgrade error-prone library version to 2.16 (#14423)
- ARROW-18047 - [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter (#14410)
- ARROW-18048 - [Dev][Archery][Crossbow] Comment bot waits for a while before generate a report (#14412)
- ARROW-18053 - [Dev] Fix a bug that merge_arrow_pr.py doesn’t detect Co-authored-by: (#14416)
- ARROW-18056 - [Ruby] Add support for building Arrow::Table from {name: Arrow::Tensor} (#14417)
- ARROW-18057 - [R] test for slice functions fail on builds without Datasets capability (#14418)
- ARROW-18058 - [Dev][Archery] Remove removed ARROW_JNI related code (#14419)
- ARROW-18061 - [CI][R] Reduce number of jobs on every commit (#14420)
- ARROW-18069 - [Docs] Suggest using force with lease initially (#14430)
- ARROW-18072 - [C++] Can’t use bundled ORC with CMake 3.10 (#14432)
- ARROW-18074 - [CI] Running ctest for PyArrow C++ not needed anymore (#14435)
- ARROW-18083 - [C++] Bump vendored zlib version (#14446)
- PARQUET-2172 - [C++] Change field return type to const NodePtr& (#13865)
Bug Fixes
- ARROW-12175 - [C++] Fix CMake packages (#13892)
- ARROW-13763 - [Python] Close files in ParquetFile & ParquetDatasetPiece (#13821)
- ARROW-14363 - [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods without explicit element type
- ARROW-15602 - [R][Docs] Update docs to explain how to read timestamp with timezone columns (#13877)
- ARROW-15733 - array.String offsets int32 overflow
- ARROW-16141 - [R] Update rhub/fedora-clang-devel for upstreamed changes (#12824)
- ARROW-16174 - [Python] Fix FixedSizeListArray.flatten() on sliced input (#14000)
- ARROW-16521 - [C++][Python] Configure curl timeout policy for S3 (#13385)
- ARROW-16651 - [Python] Casting Table to new schema ignores nullability of fields (#14048)
- ARROW-16652 - [Python] Cast compute kernel segfaults when called with a Table (#14044)
- ARROW-16674 - [Java] C data interface: Reading as nioBuffer from imported buffer causes error (#13249)
- ARROW-16754 - [Java] StructVector’s child vectors get unexpectedly reordered after adding duplicated fields (#13321)
- ARROW-16838 - [Python] Improve schema inference for pandas indexes with extension dtypes (#14080)
- ARROW-16897 - [R][C++] Full join on Arrow objects is incorrect
- ARROW-16942 - Error building JNI Libraries on MacOS: Could not find a package configuration file provided by “xsimd”
- ARROW-16993 - [C++] Don’t find Boost components if they aren’t needed (#13846)
- ARROW-17057 - [Python] S3FileSystem has no parameter for retry strategy (#13633)
- ARROW-17069 - [Docs][Python] Describe authentication for GCS public and private (#14392)
- ARROW-17084 - [R] Install the package before linting (#13620)
- ARROW-17099 - [Python] pyarrow build does not support RELWITHDEBINFO build type (#14324)
- ARROW-17104 - [CI][Python] Pyarrow cannot be imported on CI job AMD64 MacOS 10.15 Python 3
- ARROW-17166 - [R][CI] force_tests() cannot return TRUE (#13680)
- ARROW-17169 - [Go][Parquet] Panic in bitmap writer with Nullable List of Struct (#14183)
- ARROW-17193 - [C++] Add support for finding system Abseil (#13731)
- ARROW-17199 - [Java][FlightRPC] Clean up Flight SQL example server (#13710)
- ARROW-17217 - [Docs][Python] Adding pandas as required dependency (#13714)
- ARROW-17223 - [C#] DecimalArray incorrectly appends values greater than Decimal.MaxValue / 2 and less than Decimal.MinValue / 2 (#13732)
- ARROW-17228 - [Python] dataset.write_data should use Scanner.projected_schema when passed a scanner with projected columns (#13756)
- ARROW-17230 - [C++] Fix DeserializePlan, add additional option validation (#13728)
- ARROW-17233 - [Packaging][Linux] Update artifact patterns (#13740)
- ARROW-17248 - [CI][Conan] Enable Zstandard (#13742)
- ARROW-17249 - [CI][Conan] Enable bzip2 (#13743)
- ARROW-17250 - [CI][Conan] Enable utf8proc automatically (#13744)
- ARROW-17251 - [CI][Conan] Enable Flight (#13761)
- ARROW-17253 - [Python] Detect iterator exception instead of crashing (#13764)
- ARROW-17254 - [C++][Go][Java][FlightRPC] Implement and test Flight SQL GetSchema (#13898)
- ARROW-17256 - [Python] Can’t call combine_chunks on empty ChunkedArray (#13757)
- ARROW-17272 - [Dev] Pass –add-opens in integration tests (#13765)
- ARROW-17281 - [C++] Fix cache size reporting on Windows (#13813)
- ARROW-17296 - [Python] Update serialized metadata size in pyarrow.parquet.read_metadata doctest (#13790)
- ARROW-17315 - [Release][Docs] Update versions.json by post version bump (#13805)
- ARROW-17338 - [Java] The maximum request memory of BaseVariableWidthVector should limit to Integer.MAX_VALUE (#13815)
- ARROW-17341 - [C++] Fix cpu_info.cc build error on musl libc (#13819)
- ARROW-17350 - [C++] Create a scheduler for asynchronous work (#13912)
- ARROW-17353 - [Release][R] Validate binaries version (#14396)
- ARROW-17372 - [Go][Parquet] Fix failures for ppc64le (#13840)
- ARROW-17382 - [C++] open_dataset doesn’t ignore BOM in csv file when header’s with quotes (#13838)
- ARROW-17386 - [R] strptime tests not robust across platforms (#13854)
- ARROW-17389 - [Python] Properly exclude tests when PYARROW_INSTALL_TESTS=0 (#13904)
- ARROW-17410 - [JS][Integration] Downgrade zlib for integration (#13885)
- ARROW-17421 - [C++] CUDA on Windows fails to build (#13883)
- ARROW-17422 - [C++][CI] Linux builds are missing dependencies (#13886)
- ARROW-17423 - [CI][C++] Fix building CUDA docker images (#13896)
- ARROW-17426 - [C++] Substrait consumer fails to compile on older Ubuntu (#13888)
- ARROW-17433 - [CI][C++] Use Visual Studio 2019 on AppVeyor (#13903)
- ARROW-17438 - [R] glimpse() errors if there is a UDF
- ARROW-17440 - [C++] Support RISC-V architecture (#13902)
- ARROW-17448 - [R] Fix cloud storage paths in some documentation (#14070)
- ARROW-17450 - [C++][Parquet] Add support for uint8 boolean decode in addition to bool array (#14359)
- ARROW-17450 - [C++][Parquet] Support RLE decode for boolean datatype (#14147)
- ARROW-17453 - [Go][C++][Parquet] Inconsistent Data with Repetition Levels (#13982)
- ARROW-17467 - [Go] Aligned Bitmap Ops mess up the final byte when no t… (#13915)
- ARROW-17478 - [C++][Java] Update ORC to 1.7.6 (#13926)
- ARROW-17494 - [C++] Fix substrait tests linkage on static builds (#13939)
- ARROW-17496 - [Go] Fix Nightly Build (#13943)
- ARROW-17501 - [Python][wheel] Use old AWS SDK C++ (#14157)
- ARROW-17507 - [Dev][CI][R] GHA “autotune” doesn’t work (#14060)
- ARROW-17517 - [C++] Test engine API in public API test (#13965)
- ARROW-17517 - [C++] Remove internal headers from substrait API (#14131)
- ARROW-17518 - [CI][Doc][Python] Update glob to detect arrow development version from git (#13966)
- ARROW-17524 - [C++] Correction for fields included when reading an ORC table (#13962)
- ARROW-17543 - [R] Fix bug for NULL type 0-length vectors in array creation
- ARROW-17550 - [C++][CI][MinGW] Use system Python for GCS testbench (#14272)
- ARROW-17556 - [C++] Unbound scan projection expression leads to all fields being loaded (#14264)
- ARROW-17559 - [R][C++] Regression: big performance hit after removing schema binding
- ARROW-17565 - [C++] Backward compatible ${PACKAGE}_shared CMake target isn’t provided (#14003)
- ARROW-17567 - [C++] Avoid internal compiler error with gcc 7 and c++17 (#14004)
- ARROW-17571 - [Benchmarks] Default build for PyArrow seems to be debug (#14010)
- ARROW-17573 - [Go][Parquet] ByteArray statistics can cause memory leak (#14013)
- ARROW-17577 - [C++][Python] CMake cannot find Arrow/Arrow Python when building PyArrow
- ARROW-17578 - [CI][R] Fix build for Ubuntu 22.04 and GCC 12 on R (#14022)
- ARROW-17579 - [Python] PYARROW_CXXFLAGS ignored? (#14074)
- ARROW-17583 - [C++][Python] Changed datawidth of WrittenFile.size to int64 to match C++ code (#14032)
- ARROW-17598 - [C++] Skip memory_benchmark if SIMD level is NEON (#14036)
- ARROW-17611 - [Rust] Boolean column data saved with V2 from arrow-rs unreadable by pyarrow
- ARROW-17612 - [Benchmarks] Failing benchmarks on macos-arm
- ARROW-17614 - [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures (#14199)
- ARROW-17616 - [CI][Java] Solving regex to support last Arrow Java versions >= 10.0.0 (#14076)
- ARROW-17620 - [R] as_arrow_array() ignores type for StructArrays (#14047)
- ARROW-17627 - [Go][Parquet] Forward schema metadata to file without StoreSchema (#14087)
- ARROW-17639 - [R] infer_type() fails for lists where the first element is NULL (#14062)
- ARROW-17641 - [python] Fix ParseOptions deserialization of invalid_row_handler (#14061)
- ARROW-17643 - [R] Latest duckdb release is causing test failure (#14149)
- ARROW-17645 - [CI] Get conda-integration building again (#14069)
- ARROW-17675 - [C++] Modified the FileSource::Equals method to handle the case where buffer_ is null (#14085)
- ARROW-17681 - [CI][Packaging] Update brew dependency glib-utils with glib (#14095)
- ARROW-17682 - [CI][C++] Nightly test-ubuntu-20.04-cpp-thread-sanitizer fails arrow-utility-test around the AsyncTaskScheduler
- ARROW-17684 - [CI][deb] Disable Flight for arm64 (#14300)
- ARROW-17686 - [C++] Add custom ToPrint to AsofJoinBasicTest (#14172)
- ARROW-17687 - ScanningStress test is flaky in CI (#14314)
- ARROW-17696 - [C++] arrow-compute-asof-join-node-test inordinately slow (#14190)
- ARROW-17697 - [Python] Fix Cython warning in types.pxi (#14280)
- ARROW-17699 - [R] Add better error message for if a non-schema passed into open_dataset() (#14108)
- ARROW-17702 - [R][CI] Test failure on CentOS 7
- ARROW-17703 - [C++][Gandiva] Fix Gandiva OpenSSL dependency (#14109)
- ARROW-17717 - [R] Lintr error on CI (#14113)
- ARROW-17725 - [CI][Python] Fix test collection in case of Arrow built without parquet (#14119)
- ARROW-17728 - [C++][Gandiva] Accept LLVM 15.0 (#14125)
- ARROW-17733 - [C++] Take index_width into account when filling nulls in index buffer (#14129)
- ARROW-17737 - [R] Groups before conversion to a Table must not be restored after
collect()
(#14175) - ARROW-17738 - [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table (#14160)
- ARROW-17742 - [C++][Gandiva] Fix Gandiva utf8proc dependency in CMake presets (#14140)
- ARROW-17753 - [Python][Docs] Document cleaning for fixing build environment issues (#14260)
- ARROW-17770 - [C++][Gandiva] Fix const correctness of Gandiva projector Evaluate (#14165)
- ARROW-17771 - [Docs][Python] Add the use of CONDA_DLL_SEARCH_MODIFICATION_ENABLE to the docs (#14302)
- ARROW-17773 - [CI][C++] Fix sccache error on Travis-CI Arm64 build (#14201)
- ARROW-17785 - [Java] Suppress flakiness from gRPC in JDBC driver tests (#14210)
- ARROW-17787 - [Java] Fix Javadoc build (#14212)
- ARROW-17790 - [C++][Gandiva] Adapt to LLVM opaque pointer (#14187)
- ARROW-17791 - [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket
- ARROW-17795 - [C++][R] Add missing PKG_CONFIG_PATH to use system zstd (#14202)
- ARROW-17800 - [C++] Fix failures in jemalloc stats tests (#14194)
- ARROW-17805 - [C++][CI] Use Brew installed clang for MacOS
- ARROW-17813 - [Python] Nested ExtensionArray conversion to/from pandas/numpy (#14238)
- ARROW-17818 - [R] Skip duckdb test that is failing until the issue is resolved (#14209)
- ARROW-17822 - [C++][FlightRPC] Fix crash on invalid transport scheme (#14267)
- ARROW-17829 - [Python] Avoid pandas groupby deprecation warning write_to_dataset (#14306)
- ARROW-17830 - [C++][Gandiva] Temporarily pin LLVM version on AppVeyor (#14228)
- ARROW-17831 - [Python][Docs] PyArrow Architecture page outdated after moving pyarrow C++ code (#14311)
- ARROW-17842 - [C++][CI] Use Brew installed clang for MacOS verify-rc (#14236)
- ARROW-17848 - [R] Skip lubridate::format_ISO8601 tests until next release (#14282)
- ARROW-17850 - [Java] Upgrade netty + grpc + protobuf + jackson BOM versions (#14265)
- ARROW-17853 - [Python][CI] Timeout in test_dataset.py::test_write_dataset_s3_put_only (#14257)
- ARROW-17853 - temporary revert fix for test_write_dataset_max_rows_per_file (#14246)
- ARROW-17885 - [R] Return BLOB data as list of raw instead of a list of integers (#14277)
- ARROW-17915 - [C++] Error when using Substrait ProjectRel (#14295)
- ARROW-17927 - [C++] Changed SleepABitAsync to use a thread pool to reduce the # of running threads (#14339)
- ARROW-17930 - [CI][C++] Valgrind failure in PrintValue<arrow::dataset::ScannerTestParams> (#14317)
- ARROW-17931 - [C++][CI] Thread Sanitizer failure around the dataset “new scanner” on CI
- ARROW-17938 - [Python] Fix compilation error on python_test.cc (#14321)
- ARROW-17973 - [C++] Expression::ToString wrong for nullary function call (#14370)
- ARROW-17977 - [CI][C++] Don’t use LLVM 14 or later on Debian i386 (#14368)
- ARROW-17990 - [C++] Restore -mbmi2 flag (#14375)
- ARROW-17995 - [C++] Fix json decimals not being rescaled based on the explicit schema (#14380)
- ARROW-17999 - [C++] Make Minio server launch more robust (#14383)
- ARROW-18004 - [C++] ExecBatch conversion to RecordBatch may go out of bounds (#14386)
- ARROW-18018 - [C++] Potential segmentation fault in unit tests due to usage of AllComplete instead of AllFinished (#14393)
- ARROW-18031 - [C++][Parquet] Undefined behavior in bool RLE decoder (#14407)
- ARROW-18041 - [Python] Sustrait-related test failure in wheel tests (#14408)
- ARROW-18055 - [C++] arrow-dataset-dataset-writer-test still times out occassionally (#14428)
- ARROW-18062 - [R] error in CI jobs for R 3.5 and 3.6 when R package being installed (#14424)
- ARROW-18079 - [R] Improve efficiency of schema creation to prevent performance regressions (#14447)
- ARROW-18088 - [Python][CI] Build with pandas master/nightly failure related to timedelta64 resolution
- ARROW-18103 - [Packaging][deb][RPM] Fix upload artifacts patterns (#14462)