Apache Arrow 11.0.0 (26 January 2023)
This is a major release covering more than 3 months of development.
Download
- Source Artifacts
- Binary Artifacts
- Git tag
Contributors
This release includes 516 commits from 95 distinct contributors.
$ git shortlog -sn apache-arrow-10.0.0..apache-arrow-11.0.0
83 Sutou Kouhei
35 Matt Topol
28 Raúl Cumplido
25 Dewey Dunnington
21 Alenka Frim
21 Antoine Pitrou
20 Jacob Wujciak-Jens
17 David Li
17 Miles Granger
16 Weston Pace
15 Joris Van den Bossche
15 Will Jones
14 Nic Crane
10 Neal Richardson
10 Vibhatha Lakmal Abeykoon
9 rtpsw
8 eitsupi
7 Ben Harkins
7 Jin Shang
6 Alessandro Molina
6 Bryce Mecum
6 Fatemah Panahi
6 Gang Wu
6 Larry White
6 mwish
5 gf2121
4 David Sisson
4 Hirokazu SUZUKI
4 LouisClt
3 0x26res
3 Rok Mihevc
3 h-vetinari
2 Austin Dickey
2 Benson Muite
2 Jonathan Keane
2 Kshiteej K
2 Libor Ryšavý
2 Nikita Eshkeev
2 Percy Camilo Triveño Aucahuasi
2 Sasha Krassovsky
2 Todd Farmer
2 Yibo Cai
2 buaazhwb
2 dependabot[bot]
2 lafiona
1 0xflotus
1 André Kohn
1 Anja Kefala
1 Benjamin Kietzman
1 Daniel Sullivan
1 Danielle Navarro
1 Dean Attali
1 Dhulkifli Hussein
1 Dominik Moritz
1 Dongjoon Hyun
1 Dr. Jan-Philip Gehrcke
1 ElenaHenderson
1 Felipe Oliveira Carvalho
1 Frederick Jansen
1 Hadley Wickham
1 Ian Cook
1 JacekPliszka
1 JiaKe
1 Jianshen Liu
1 Jonas Haag
1 Joost Hoozemans
1 Julien Roncaglia
1 Kae S
1 Kazuaki Ishizaki
1 Kyle Barron
1 Laurent Quérel
1 Lionel Henry
1 Mark Schreiber
1 Matti Picus
1 Noah Treuhaft
1 Paul Taylor
1 Pierre Gramme
1 Quang Hoang
1 Sahaj Gupta
1 Sanjiban Sengupta
1 Sho Nakatani
1 Siddhant Rao
1 Tamas Mate
1 Tao He
1 Thomas Sarlandie
1 Tomek Drabas
1 William Ayd
1 Y
1 Yue
1 emkornfield
1 fdzuJ
1 kambhamvivekshankar
1 lukester1975
1 martin-kokos
1 zagto
Patch Committers
The following Apache committers merged contributed patches to the repository.
$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-10.0.0..apache-arrow-11.0.0
148 Sutou Kouhei
89 Antoine Pitrou
50 Joris Van den Bossche
36 David Li
36 Matt Topol
34 Weston Pace
24 Dewey Dunnington
24 Nic Crane
16 Jacob Wujciak-Jens
13 Will Jones
8 Neal Richardson
6 Raúl Cumplido
6 Yibo Cai
4 Alessandro Molina
4 Rok Mihevc
3 Dominik Moritz
3 Jonathan Keane
2 Alenka Frim
1 Micah Kornfield
1 dependabot[bot]
Changelog
Apache Arrow 11.0.0 (2023-01-25 08:00:00)
New Features and Improvements
- ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
- ARROW-11776 - [C++][Java] Support parquet write from ArrowReader to file (#14151)
- ARROW-13938 - [C++] Date and datetime types should autocast from strings
- ARROW-13980 - [Go] Implement Scalar ApproxEquals (#14543)
- ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
- ARROW-14832 - [R] Implement bindings for stringr::str_remove and stringr::str_remove_all (#14644)
- ARROW-14999 - [C++] Optional field name equality checks for map and list type (#14847)
- ARROW-15006 - [Python][Doc] Add five more numpydoc checks to CI (#15214)
- ARROW-15006 - [Python][CI][Doc] Enable numpydoc check PR03 (#13983)
- ARROW-15206 - [Ruby] Add support for
Arrow::Table.load(uri, schema:)
(#15148) - ARROW-15460 - [R] Add as.data.frame.Dataset method (#14461)
- ARROW-15470 - [R] Set null value in CSV writer (#14679)
- ARROW-15538 - [C++] Expanding coverage of math functions from Substrait to Acero (#14434)
- ARROW-15592 - [C++] Add support for custom output field names in a substrait::PlanRel (#14292)
- ARROW-15691 - [Dev] Update archery to work with either master or main as default branch (#14033)
- ARROW-15732 - [C++] Do not use any CPU threads in execution plan when use_threads is false (#15104)
- ARROW-15812 - [R] Accept col_names in open_dataset for CSV (#14705)
- ARROW-16266 - [R] Add StructArray$create() (#14922)
- ARROW-16337 - [Python] Expose flag to enable/disable storing Arrow schema in Parquet metadata (#13000)
- ARROW-16430 - [Python] Add support for reading record batch custom metadata API (#13041)
- ARROW-16480 - [R] Update read_csv_arrow and open_dataset parse_options, read_options, and convert_options to take lists (#15270)
- ARROW-16616 - [Python] Add lazy Dataset.filter() method (#13409)
- ARROW-16673 - [Java] Integrate C Data into allocator hierarchy (#14506)
- ARROW-16728 - [Python] ParquetDataset to still take legacy code path when old filesystem is passed (#15269)
- ARROW-16728 - [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset (#14052)
- ARROW-16782 - [Format] Add REE definitions to FlatBuffers (#14176)
- ARROW-17025 - [Dev] Remove github user name links from merge commit message (#14458)
- ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
- ARROW-17187 - [R] Improve lazy ALTREP implementation for String (#14271)
- ARROW-17212 - [Python] Support lazy Dataset.filter
- ARROW-17301 - [C++] Implement compute function “binary_slice” (#14550)
- ARROW-17302 - [R] Configure curl timeout policy for S3 (#15166)
- ARROW-17360 - [Python] Order of columns in pyarrow.feather.read_table (#14528)
- ARROW-17416 - [R] Implement lubridate::with_tz and lubridate::force_tz
- ARROW-17425 - [R]
lubridate::as_datetime()
in dplyr query should be able to handle time in sub seconds (#13890) - ARROW-17462 - [R] Cast scalars to type of field in Expression building (#13985)
- ARROW-17509 - [C++] Simplify async scheduler by removing the need to call End (#14524)
- ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll) (#14186)
- ARROW-17610 - [C++] Support additional source types in SourceNode (#14207)
- ARROW-17613 - [C++] Add function execution API for a preconfigured kernel (#14043)
- ARROW-17640 - [C++] Add File Handling Test cases for GlobFile handling in Substrait Read (#14132)
- ARROW-17662 - [R] Facilitate offline installation from binaries (#14086)
- ARROW-17726 - [CI] Enable sccache on more builds
- ARROW-17731 - [Website] Add blog post about Flight SQL JDBC driver
- ARROW-17732 - [Docs][Java] Add minimal JDBC driver docs (#14137)
- ARROW-17751 - [Go][Benchmarking] Add Go Benchmark Script (#14148)
- ARROW-17777 - [Dev] Update the pull request merge script to work with master or main
- ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer (#14191)
- ARROW-17812 - [Gandiva][Docs] Add C++ Gandiva User Guide (#14200)
- ARROW-17825 - [C++] Allow the possibility to write several tables in ORCFileWriter (#14219)
- ARROW-17832 - [Python] Construct MapArray from sequence of dicts (instead of list of tuples) (#14547)
- ARROW-17836 - [C++] Allow specifying alignment of buffers (#14225)
- ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan’s shared data structures (#14227)
- ARROW-17838 - [Python] Unify CMakeLists.txt in python/ (#14925)
- ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource (#14250)
- ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in Flight SQL (#14266)
- ARROW-17870 - [Go] Add Scalar Binary Arithmetic
- ARROW-17871 - [Go] initial binary arithmetic implementation (#14255)
- ARROW-17887 - [R][Doc] Improve readability of the Get Started and README pages (#14514)
- ARROW-17892 - [CI] Use Python 3.10 in AppVeyor build (#14307)
- ARROW-17899 - [Go][CSV] Add Decimal support to CSV reader (#14504)
- ARROW-17932 - [C++] Implement streaming RecordBatchReader for JSON (#14355)
- ARROW-17949 - [C++][Docs] Remove the use of clcache from Windows dev docs (#14529)
- ARROW-17953 - [Archery] Add archery docker info command (#14345)
- ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
- ARROW-17966 - [C++] Adjust to new format for Substrait optional arguments (#14415)
- ARROW-17972 - [CI] Update CUDA docker jobs (#14362)
- ARROW-17975 - [C++] Create at-fork facility (#14594)
- ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
- ARROW-17989 - [C++][Python] Enable struct_field kernel to accept string field names (#14495)
- ARROW-18008 - [Python][C++] Add use_threads to run_substrait_query
- ARROW-18012 - [R] Make map_batches .lazy = TRUE by default (#14521)
- ARROW-18014 - [Java] Implement copy functions for vectors and Table (#14389)
- ARROW-18016 - [CI] Add sccache to r jobs (#14570)
- ARROW-18033 - [CI] Use $GITHUB_OUTPUT instead of set-output (#14409)
- ARROW-18042 - [Java] Distribute Apple M1 compatible JNI libraries via mavencentral (#14472)
- ARROW-18043 - [R] Properly instantiate empty arrays of extension types in Table__from_schema (#14519)
- ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
- ARROW-18075 - [Website] Update install page for 9.0.0
- ARROW-18081 - [Go] Add Scalar Boolean functions (#14442)
- ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
- ARROW-18108 - [Go] More scalar binary arithmetic (Multiply and Divide) (#14544)
- ARROW-18109 - [Go] Initial Unary Arithmetic (#14605)
- ARROW-18110 - [Go] Scalar Comparisons (#14669)
- ARROW-18111 - [Go] Remaining scalar binary arithmetic (shifts, power, bitwise) (#14703)
- ARROW-18112 - [Go] Remaining Scalar Arithmetic (#14777)
- ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
- ARROW-18120 - [Release][Dev] Automate running binaries/wheels verifications (#14469)
- ARROW-18121 - [Release][CI] Use Ubuntu 22.04 for verifying binaries (#14470)
- ARROW-18122 - [Release][Dev] Update expected vote e-mail (#14548)
- ARROW-18122 - [Release][Dev] Add verification PR URL to vote email (#14471)
- ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be uninitialized (#14480)
- ARROW-18137 - [Python][Docs] adding info about TableGroupBy.aggregation with empty list (#14482)
- ARROW-18144 - [C++] Improve JSONTypeError error message in testing (#14486)
- ARROW-18147 - [Go] Add Scalar Add/Sub for Decimal types (#14489)
- ARROW-18151 - [CI] Avoid unnecessary redirect for some conda URLs (#14494)
- ARROW-18152 - [Python] DataFrame Interchange Protocol for pyarrow Table
- ARROW-18169 - [Website] Don’t run dev docs update on fork repositories
- ARROW-18173 - [Python] Drop older versions of Pandas (<1.0) (#14631)
- ARROW-18174 - [R] Fix compile of altrep.cpp on some builds (#14530)
- ARROW-18177 - [Go] Add Add/Sub for Temporal types (#14532)
- ARROW-18178 - [Java] ArrowVectorIterator incorrectly closes Vectors (#14534)
- ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
- ARROW-18203 - [R] Refactor to remove unnecessary uses of build_expr (#14553)
- ARROW-18206 - [C++][CI] Add a nightly build for C++20 compilation (#14571)
- ARROW-18220 - [Dev] Remove a magic number for the default parallel level in downloader (#14563)
- ARROW-18221 - [Release][Dev] Add support for customizing arrow-site dir (#14564)
- ARROW-18222 - [Release][MSYS2] Detect reverse dependencies automatically (#14565)
- ARROW-18223 - [Release][Homebrew] Detect reverse dependencies automatically (#14566)
- ARROW-18224 - [Release][jar] Use temporary directory for download (#14567)
- ARROW-18230 - [Python] Pass Cmake args to Python CPP
- ARROW-18233 - [Release][JS] don’t install yarn to system (#14577)
- ARROW-18235 - [C++][Gandiva] Fix the like function implementation for escape chars (#14579)
- ARROW-18237 - [Java] Extend Table code (#14573)
- ARROW-18238 - [Docs][Python] Improve docs for S3FileSystem (#14599)
- ARROW-18240 - [R] head() is crashing on some nightly builds (#14582)
- ARROW-18243 - [R] Sanitizer nightly failure pointing to mixup between TimestampType and DurationType
- ARROW-18248 - [CI][Release] Use GitHub token to avoid API rate limit (#14588)
- ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
- ARROW-18253 - [C++][Parquet] Add additional bounds safety checks (#14592)
- ARROW-18259 - [C++][CMake] Add support for system Thrift CMake package (#14597)
- ARROW-18264 - [Python] Add missing value accessor to temporal types (#14746)
- ARROW-18264 - [Python] Expose time32/time64 scalar values (#14637)
- ARROW-18270 - [Python] Remove gcc 4.9 compatibility code (#14602)
- ARROW-18278 - [Java] Adjust path in Maven generate-libs-jni-macos-linux (#14623)
- ARROW-18280 - [C++][Python] Support slicing to end in list_slice kernel (#14749)
- ARROW-18282 - [C++][Python] Support step >= 1 in list_slice kernel (#14696)
- ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg (#14609)
- ARROW-18289 - [Release][vcpkg] Add a script to update vcpkg’s arrow port (#14610)
- ARROW-18291 - [Release][Docs] Update how to release (#14612)
- ARROW-18292 - [Release][Python] Upload .wheel/.tar.gz for release not RC (#14708)
- ARROW-18303 - [Go] Allow easy compute module importing (#14690)
- ARROW-18306 - [R] Failing test after compute function updates (#14620)
- ARROW-18318 - [Python] Expose Scalar.validate() (#15149)
- ARROW-18321 - [R] Add tests for binary_slice kernel (#14647)
- ARROW-18323 - Enabling issue templates in GitHub issues (#14675)
- ARROW-18332 - [Go] Cast Dictionary types to value type (#14650)
- ARROW-18333 - [Go][Docs] Update compute function docs (#14815)
- ARROW-18336 - [Release][Docs] Don’t update versions not in major release (#14653)
- ARROW-18337 - [R] Possible undesirable handling of POSIXlt objects (#15277)
- ARROW-18340 - [Python] PyArrow C++ header files no longer always included in installed pyarrow (#14656)
- ARROW-18341 - [Doc][Python] Update note about bundling Arrow C++ on Windows (#14660)
- ARROW-18342 - [C++] AsofJoinNode support for Boolean data field (#14658)
- ARROW-18345 - [R] Create a CRAN-specific packaging checklist that lives in the R package directory (#14678)
- ARROW-18348 - [CI][Release][Yum] redhat-rpm-config is needed on AlmaLinux 9 (#14661)
- ARROW-18350 - [C++] Use std::to_chars instead of std::to_string (#14666)
- ARROW-18358 - [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow
- ARROW-18361 - [CI][Conan] Merge upstream changes (#14671)
- ARROW-18363 - [Docs] Include warning when viewing old docs (redirecting to stable/dev docs) (#14839)
- ARROW-18366 - [Packaging][RPM][Gandiva] Fix link error on AlmaLinux 9 (#14680)
- ARROW-18367 - [C++] Enable the creation of named table relations (#14681)
- ARROW-18373 - Fix component drop-down, add license text (#14688)
- ARROW-18377 - MIGRATION: Automate component labels from issue form content (#15245)
- ARROW-18380 - [Dev] Update dev_pr GitHub workflows to accept both GitHub issues and JIRA (#14731)
- ARROW-18384 - [Release][MSYS2] Show pull request title (#14709)
- ARROW-18391 - [R] Fix the version selector dropdown in the dev docs (#14800)
- ARROW-18395 - [C++] Move select-k implementation into separate module
- ARROW-18399 - [Python] Reduce warnings during tests (#14729)
- ARROW-18401 - [R] Failing test on test-r-rhub-ubuntu-gcc-release-latest (#14894)
- ARROW-18402 - [C++] Expose
DeclarationInfo
(#14765) - ARROW-18406 - [C++] Can’t build Arrow with Substrait on Ubuntu 20.04 (#14735)
- ARROW-18407 - [Release][Website] Use UTC for release date (#14737)
- ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in building plasma-glib (#14739)
- ARROW-18410 - [Packaging][Ubuntu] Add support for Ubuntu 22.10 (#14740)
- ARROW-18413 - [C++][Parquet] Expose page index info from ColumnChunkMetaData (#14742)
- ARROW-18418 - [Website] do not delete /datafusion-python
- ARROW-18419 - [C++] Update vendored fast_float (#14817)
- ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex (#14803)
- ARROW-18421 - [C++][ORC] Add accessor for stripe information in reader (#14806)
- ARROW-18423 - [Python] Expose reading a schema from an IPC message (#14831)
- ARROW-18426 - Update committers and PMC members on website
- ARROW-18427 - [C++] Support negative tolerance in
AsofJoinNode
(#14934) - ARROW-18428 - [Website] Enable github issues on arrow-site repo
- ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
- GH-14474 - Opportunistically delete R references to shared pointers where possible (#15278)
- GH-14720 - [Dev] Update merge_arrow_pr script to accept GitHub issues (#14750)
- GH-14755 - [Python] Expose QuotingStyle to Python (#14722)
- GH-14761 - [Dev] Update labels on PR labeler to use new Component ones (#14762)
- GH-14778 - [Python] Add (Chunked)Array sort() method (#14781)
- GH-14784 - [Dev] Add possibility to autoassign on GitHub issue comment (#14785)
- GH-14786 - [Java][Doc] Replace in-folder documentation (#14789)
- GH-14787 - [Java][Doc] Update table.rst (#14794)
- GH-14809 - [Dev] Add created GitHub issues to issues@arrow.apache.org (#14811)
- GH-14816 - [Release] Make dev/release/06-java-upload.sh reusable from other project (#14830)
- GH-14824 - [CI] r-binary-packages should only upload artifacts if all tests succeed (#14841)
- GH-14844 - [Java] Short circuit null checks when comparing non null field types (#15106)
- GH-14846 - [Dev] Support GitHub Releases in download_rc_binaries.py (#14848)
- GH-14854 - Make changes to .md pages (#14852)
- GH-14869 - [C++] Add Cflags.private defining
_STATIC to .pc.in. (#14900) - GH-14873 - [Java] DictionaryEncoder can decode without building a DictionaryHashTable (#14874)
- GH-14885 - [Docs] Make changes to the New Contrib Guide (Jira -> GitHub) (#14889)
- GH-14901 - [Java] ListSubfieldEncoder and StructSubfieldEncoder can decode without DictionaryHashTable (#14902)
- GH-14918 - [Docs] Make changes to developers section of the docs (Jira -> GitHub) (#14919)
- GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake package (#15251)
- GH-14937 - [C++] Add rank kernel benchmarks (#14938)
- GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED encoding (#15140)
- GH-14961 - [Ruby] Use newer extpp for C++17 (#14962)
- GH-14975 - [Python] Dataset.sort_by (#14976)
- GH-14976 - [Python] Avoid dependency on exec plan in Table.sort_by to fix minimal tests (#15268)
- GH-14977 - [Dev][CI] Add notify-token-expiration to archery (#14978)
- GH-14981 - [R] Forward compatibility with dplyr::join_by() (#33664)
- GH-14986 - [Release] Don’t detect previous version on maint-X.Y.Z branch (#14987)
- GH-14992 - [Packaging] Make dev/release/binary-task.rb reusable from other project (#14994)
- GH-14997 - [Release] Ensure archery release tasks works with both new style GitHub issues and old style JIRA issues (#33615)
- GH-14999 - [Release][Archery] Update archery release changelog to support GitHub issues
- GH-15002 - [Release][Archery] Update archery release cherry-pick to support GitHub issues
- GH-15005 - [Go] Add scalar.Append to append scalars to builder (#15006)
- GH-15009 - [R] stringr 1.5.0 with the str_like function is already released (#15010)
- GH-15012 - [Packaging][deb] Use system Protobuf for Debian GNU/Linux bookworm (#15013)
- GH-15035 - [CI] Remove unsupported turbodbc jobs and scripts from CI (#15036)
- GH-15050 - [Java][Docs] Update and consolidate Memory documentation (#15051)
- GH-15072 - [C++] Move the round functionality into a separate module (#15073)
- GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit (#15182)
- GH-15081 - [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh (#15082)
- GH-15084 - [Ruby] Use common keys when keys.nil? in Table#join (#15088)
- GH-15085 - [Ruby] Add ColumnContainable#column_names (#15089)
- GH-15087 - [Release] Slow down downloading RC binaries from GitHub (#15090)
- GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
- GH-15100 - [C++][Parquet] Add benchmark for reading strings from Parquet (#15101)
- GH-15119 - [Release][Docs][R] Update version information in patch release (#15120)
- GH-15134 - [Ruby] Specify -mmacox-version-min=10.14 explicitly for old Xcode (#15135)
- GH-15146 - [GLib] Add
GADatasetFinishOptions
(#15147) - GH-15151 - [C++] Adding RecordBatchReaderSource to solve an issue in R API (#15183)
- GH-15168 - [GLib] Add support for half float (#15169)
- GH-15174 - [Go][FlightRPC] Expose Flight Server Desc and RegisterFlightService (#15177)
- GH-15185 - [C++][Parquet] Improve documentation for Parquet Reader column_indices (#15184)
- GH-15199 - [C++][Substrait] Allow AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
- GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
- GH-15205 - [R] Fix a parquet-fixture finding in R tests (#15207)
- GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch (#15240)
- GH-15218 - [Python] Remove auto generated pyarrow_api.h and pyarrow_lib.h (#15219)
- GH-15226 - [C++] Add DurationType to hash kernels (#33685)
- GH-15237 - [C++] Add ::arrow::Unreachable() using std::string_view (#15238)
- GH-15239 - [C++][Parquet] Parquet writer writes decimal as int32/64 (#15244)
- GH-15249 - [Documentation] Add PR template (#15250)
- GH-15257 - [GLib][Dataset] Add GADatasetHivePartitioning (#15272)
- GH-15265 - [Java] Publish SBOM artifacts (#15267)
- GH-15289 - [Ruby] Return self when saving Table to csv (#33653)
- GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case when the scalar is null (#15291)
- GH-33607 - [C++] Support optional additional arguments for inline visit functions (#33608)
- GH-33610 - [Dev] Do not allow ARROW prefixed tickets to be merged nor used on PR titles (#33611)
- GH-33619 - [Documentation] Update PR template (#33620)
- GH-33657 - [C++] arrow-dataset.pc doesn’t depend on parquet.pc without ARROW_PARQUET=ON (#33665)
- GH-33670 - [GLib] Add
GArrowProjectNodeOptions
(#33677) - GH-33671 - [GLib] Add
garrow_chunked_array_new_empty()
(#33675) - PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated fields (#14366)
- PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader (#14142)
- PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space (#14509)
- PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader ReadBatch and Skip (#14523)
- PARQUET-2209 - [parquet-cpp] Optimize skip for the case that number of values to skip equals page size (#14545)
- PARQUET-2210 - [C++][Parquet] Skip pages based on header metadata using a callback (#14603)
- PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field (#14556)
Bug Fixes
- ARROW-11631 - [R] Implement RPrimitiveConverter for Decimal type
- ARROW-15026 - [Python] Error if datetime.timedelta to pyarrow.duration conversion overflows (#13718)
- ARROW-15328 - [C++][Docs] Streaming CSV reader missing from documentation (#14452)
- ARROW-15822 - [C++] Cast duration to string (thus CSV writing) not supported (#14450)
- ARROW-16464 - [C++][CI][GPU] Add CUDA CI (#14497)
- ARROW-16471 - [Go] RecordBuilder UnmarshalJSON handle complex values (#14560)
- ARROW-16547 - [Python] to_pandas fails with FixedOffset timezones when timestamp_as_object is used (#14448)
- ARROW-16795 - [C#][Flight] Nightly verify-rc-source-csharp-macos-arm64 fails (#15235)
- ARROW-16817 - [C++] Test ORC writer errors with invalid types (#14638)
- ARROW-17054 - [R] Creating an Array from an object bigger than 2^31 results in an Array of length 0 (#14929)
- ARROW-17192 - [Python] Pass **kwargs in read_feather to to_pandas() (#14492)
- ARROW-17332 - [R] error parsing folder path with accent (‘c:/Público’) in read_csv_arrow (#14930)
- ARROW-17361 - [R] dplyr::summarize fails with division when divisor is a variable (#14933)
- ARROW-17374 - [C++] Snappy package may be built without CMAKE_BUILD_TYPE (#14818)
- ARROW-17458 - [C++] Cast between decimal and string (#14232)
- ARROW-17538 - [C++] Import schema when importing array stream (#15037)
- ARROW-17637 - [R][us][s] (#14935)
- ARROW-17692 - [R] Add support for building with system AWS SDK C++ (#14235)
- ARROW-17772 - [Doc] Sphinx / reST markup error
- ARROW-17774 - [Python] Add python test for decimals to csv (#14525)
- ARROW-17858 - [C++] Compilating warning in arrow/csv/parser.h (#14445)
- ARROW-17893 - [Python] Test that reading of timedelta is stable (read_feather/to_pandas) (#14531)
- ARROW-17985 - [C++][Python] Improve s3fs error message when wrong region (#14601)
- ARROW-17991 - [Python][C++] Adding support for IpcWriteOptions to the dataset ipc file writer (#14414)
- ARROW-18052 - [Python] Support passing create_dir thru pq.write_to_dataset (#14459)
- ARROW-18068 - [Dev][Archery][Crossbow] Comment bot only waits for task if link is not available (#14429)
- ARROW-18070 - [C++] Invoke google::protobuf::ShutdownProtobufLibrary for substrait tests (#14508)
- ARROW-18086 - [Ruby] Add support for HalfFloat (#15204)
- ARROW-18087 - [C++] RecordBatch::Equals should not ignore field names (#14451)
- ARROW-18088 - [CI][Python] Fix pandas master/nightly build failure related to timedelta (#14460)
- ARROW-18101 - [R] RecordBatchReaderHead from ExecPlan with UDF cannot be read (#14518)
- ARROW-18106 - [C++] JSON reader ignores explicit schema with default unexpected_field_behavior=”infer” (#14741)
- ARROW-18117 - [C++] Fix static bundle build (#14465)
- ARROW-18118 - [Release][Dev] Fix problems in 02-source.sh/03-binary-submit.sh for 10.0.0-rc0 (#14468)
- ARROW-18123 - [Python] Fix writing files with multi-byte characters in file name (#14764)
- ARROW-18125 - [Python] Handle pytest 8 deprecations about pytest.warns(None)
- ARROW-18126 - [Python] Remove ARROW_BUILD_DIR in building pyarrow C++ (#14498)
- ARROW-18128 - [Java][CI] Update timestamp of Java Nightlies X.Y.Z-SNAPSHOT folder (#14496)
- ARROW-18149 - [C++] fix build failure of
join_example
(#14490) - ARROW-18157 - [Dev][Archery] “archery docker run” sets env var to None when inherited (#14501)
- ARROW-18158 - [CI] Use default Python version when installing conda cpp environment to fix conda builds (#14500)
- ARROW-18159 - [Go][Release] Add
go install
to verify-release script (#14503) - ARROW-18161 - [Ruby] Refer source input in sub objects (#15217)
- ARROW-18164 - [Python] Honor default memory pool in Dataset scanning (#14516)
- ARROW-18167 - [Go][Release] update go.work with release (#14522)
- ARROW-18172 - [CI][Release] Source Release and Merge Script jobs fail on master
- ARROW-18183 - [C++] cpp-micro benchmarks are failing on mac arm machine (#14562)
- ARROW-18188 - [CI] CUDA nightly docker upload fails due to wrong tag (#14538)
- ARROW-18195 - [C++] Fix case_when produces bad data when condition has nulls (#15131)
- ARROW-18202 - [C++] Reallow regexp replace on empty string (#15132)
- ARROW-18205 - [C++] Substrait consumer is not converting right side references correctly on joins (#14558)
- ARROW-18207 - [Ruby] RubyGems for 10.0.0 aren’t updated yet
- ARROW-18209 - [Java] Make ComplexCopier agnostic of specific implementation of MapWriter (UnionMapWriter) (#14557)
- ARROW-18212 - [C++] NumericBuilder::Reset() doesn’t reset all members (#14559)
- ARROW-18225 - [Python] Fully support filesystem in parquet.write_metadata (#14574)
- ARROW-18227 - [CI][Packaging] Do not fail conda-clean if conda search raises PackagesNotFound (#14569)
- ARROW-18229 - [Python] Check schema argument type in RecordBatchReader.from_batches (#14583)
- ARROW-18231 - [C++][CMake] Add support for overriding optimization level (#15022)
- ARROW-18246 - [Python][Docs] PyArrow table join docstring typos for left and right suffix arguments (#14591)
- ARROW-18247 - [JS] fix: RangeError crash in Vector.toArray() (#14587)
- ARROW-18256 - [C++][Windows] Use IMPORTED_IMPLIB for external shared Thrift (#14595)
- ARROW-18257 - [Python] pass back time types with correct type class (#14633)
- ARROW-18269 - [C++] Handle slash character in Hive-style partition values (#14646)
- ARROW-18272 - [Python] Support filesystem parameter in ParquetFile (#14717)
- ARROW-18284 - [Python][Docs] Add missing CMAKE_PREFIX_PATH to allow setup.py CMake invocations to find Arrow CMake package (#14586)
- ARROW-18290 - [C++] Escape all special chars in URI-encoding (#14645)
- ARROW-18309 - [Go] Fix delta bit packing decode panic (#14649)
- ARROW-18320 - [C++][FlightRPC] Fix improper Status/Result conversion in Flight client (#14859)
- ARROW-18334 - [C++] Handle potential non-commutativity by rebinding (#14659)
- ARROW-18339 - [Python][CI] Add DYLD_LIBRARY_PATH to avoid requiring PYARROW_BUNDLE_ARROW_CPP on macOS job (#14643)
- ARROW-18343 - [C++] Remove AllocateBitmap() with out parameter (#14657)
- ARROW-18351 - [C++][FlightRPC] Fix crash in DoExchange with UCX (#15031)
- ARROW-18353 - [C++][FlightRPC] Prevent concurrent Finish in UCX (#15034)
- ARROW-18360 - [Python] Don’t crash when schema=None in FlightClient.do_put (#14698)
- ARROW-18374 - [Go][CI][Benchmarking] Fix Go benchmark github info (#14691)
- ARROW-18374 - [Go][CI][Benchmarking] Fix Go Bench Script after Conbench change (#14689)
- ARROW-18379 - [Python] Change warnings to _warnings in _plasma_store_entry_point (#14695)
- ARROW-18382 - [C++] Set ADDRESS_SANITIZER in fuzzing builds (#14702)
- ARROW-18383 - [C++] Avoid global variables for thread pools and at-fork handlers (#14704)
- ARROW-18389 - [CI][Python] Update nightly test-conda-python-3.7-pandas-0.24 to pandas >= 1.0 (#14714)
- ARROW-18390 - [CI][Python] Update spark test modules to match spark master (#14715)
- ARROW-18392 - [Python] Fix test_s3fs_wrong_region; set anonymous=True (#14716)
- ARROW-18394 - [Python][CI] Fix nightly job using pandas dev (temporarily skip tests) (#15048)
- ARROW-18397 - [C++] Clear S3 region resolver client at S3 shutdown (#14718)
- ARROW-18400 - [Python] Quadratic memory usage of Table.to_pandas with nested data
- ARROW-18405 - [Ruby] Avoid rebuilding chunked arrays in Arrow::Table.new (#14738)
- ARROW-18412 - [C++][R] Windows build fails because of missing ChunkResolver symbols (#14774)
- ARROW-18424 - [C++] Fix Doxygen error on ARROW_ENGINE_EXPORT (#14845)
- ARROW-18429 - [R] : Bump dev version following 10.0.1 patch release (#14887)
- ARROW-18436 - [C++] Ensure correct (un)escaping of special characters in URI paths (#14974)
- ARROW-18437 - [C++][Parquet] Fix encoder for DELTA_BINARY_PACKED when flushing more than once (#14959)
- GH-14745 - [R] {rlang} dependency must be at least version 1.0.0 because of check_dots_empty (#14744)
- GH-14775 - [Go] Fix UnionBuilder.Len implementations (#14776)
- GH-14780 - [Go] Fix issues with IPC writing of sliced map/list arrays (#14793)
- GH-14791 - [JS] Fix BitmapBufferBuilder size truncation (#14881)
- GH-14805 - [Format] C Data Interface: clarify nullability of buffer pointers (#14808)
- GH-14819 - [CI][RPM] Add workaround for build failure on CentOS 9 Stream (#14820)
- GH-14828 - [CI][Conda] Sync with conda-forge, fix nightly jobs (#14832)
- GH-14842 - [C++] Propagate some errors in JSON chunker (#14843)
- GH-14849 - [CI] R install-local builds sometimes fail because sccache times out (#14850)
- GH-14855 - [C++] Support importing zero-case unions (#14857)
- GH-14856 - [CI] Azure builds fail with docker permission error (#14858)
- GH-14865 - [Go][Parquet] Address several memory leaks of buffers in pqarrow (#14878)
- GH-14872 - [R] arrow returns wrong variable content when multiple group_by/summarise statements are used (#14905)
- GH-14875 - [C++] C Data Interface: check imported buffer for non-null (#14814)
- GH-14876 - [Go] Handling Crashes in C Data interface (#14877)
- GH-14883 - [Go] Fix IPC encoding empty maps (#14904)
- GH-14883 - [Go] ipc.Writer leaks memory when compressing body (#14892)
- GH-14884 - [CI] R install resource may got 404 (#14893)
- GH-14890 - [Java] Fix memory leak of DictionaryEncoder when exception thrown (#14891)
- GH-14907 - [R] right_join() function does not produce the expected outcome (#15077)
- GH-14909 - [Java] Prevent potential memory leak of ListSubfieldEncoder and StructSubfieldEncoder (#14910)
- GH-14916 - [C++] Remove the API declaration about “ConcatenateBuffers” (#14915)
- GH-14927 - [Dev] Crossbow submit does not work with fine grained PATs (#14928)
- GH-14940 - [Go][Parquet] Fix Encryption Column writing (#14954)
- GH-14943 - [Python] Fix pyarrow.get_libraries() order (#14944)
- GH-14945 - [Ruby] Add support for macOS 12 / Xcode 14 (#14960)
- GH-14947 - [R] Compatibility with dplyr 1.1.0 (#14948)
- GH-14949 - [CI][Release] Output script’s stdout on failure (#14957)
- GH-14967 - [R] Minimal nightly builds are failing (#14972)
- GH-14968 - [Python] Fix segfault for dataset ORC write (#15049)
- GH-14990 - [C++][Skyhook] Follow FileFormat API change (#15086)
- GH-14993 - [CI][Conda] Fix missing RECIPE_ROOT variable now expected by conda build (#15014)
- GH-14995 - [Go][FlightSQL] Fix Supported Unions Constant (#15003)
- GH-15001 - [R] Fix Parquet datatype test failure (#15197)
- GH-15007 - [CI][RPM] Ignore import failed key (#15008)
- GH-15023 - [CI][Packaging][Java] Force to use libz3.a with Homebrew (#15024)
- GH-15025 - [CI][C++][Homebrew] Ensure removing Python related commands (#15026)
- GH-15028 - [R][Docs]
NOT_CRAN
should be"true"
instead ofTRUE
in R (#15029) - GH-15040 - [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF (#15075)
- GH-15042 - [C++][Parquet] Update stats on subsequent batches of dictionaries (#15179)
- GH-15043 - [Python][Docs] Update docstring for pyarrow.decompress (#15061)
- GH-15052 - [C++][Parquet] Fix DELTA_BINARY_PACKED decoder when reading only one value (#15124)
- GH-15062 - [C++] Simplify EnumParser behavior (#15063)
- GH-15064 - [Python][CI] Dask nightly tests are failing due to fsspec bug (#15065)
- GH-15069 - [C++][Python][FlightRPC] Make DoAction truly streaming (#15118)
- GH-15080 - [CI][R] Re-enable binary package job for R 4.1 on Windows (#25359)
- GH-15092 - [CI][C++][Homebrew] Ensure removing Python related commands (again) (#15093)
- GH-15094 - [CI][Release][Ruby] Install Bundler by APT (#15095)
- GH-15110 - [R][CI] Windows build fails in packaging job (#15111)
- GH-15114 - [R][C++][CI] Homebrew can’t install Python 3.11 on GHA runners (#15116)
- GH-15115 - [R][CI] pyarrow tests fail on macos 10.13 due to missing pyarrow wheel (#15117)
- GH-15122 - [Benchmarking][Python] Set ARROW_INSTALL_NAME_RPATH=ON for benchmark builds (#15123)
- GH-15126 - [R] purrr::rerun was deprecated in purrr 1.0.0 (#15127)
- GH-15136 - [Python][macOS] Use
@rpath
for libarrow_python.dylib (#15143) - GH-15141 - [C++] fix for unstable test due to unstable sort (#15142)
- GH-15150 - [C++][FlightRPC] Wait for side effects in DoAction (#15152)
- GH-15156 - [JS] Fix can’t find variable: BigInt64Array (#15157)
- GH-15172 - [Python] Docstring test failure (#15186)
- GH-15176 - Fix various issues introduced in the asof-join benchmark by ARROW-17980 and ARROW-15732 (#15190)
- GH-15189 - [R] Skip S3 tests on MacOS 10.13 (#33613)
- GH-15243 - [C++] fix for potential deadlock in the group-by node (#33700)
- GH-15254 - [GLib] garrow_execute_plain_wait() checks the finished status (#15255)
- GH-15259 - [CI] component assignment fails due to typo (#15260)
- GH-15264 - [C++] Add scanner tests for disabling readahead and fix relevant bugs (#29185)
- GH-15274 - [Java][FlightRPC] handle null keystore password (#15276)
- GH-15282 - [CI][C++] add CLANG_TOOLS variable in .travis.yaml (#32972)
- GH-15292 - [C++] Typeclass alias is missing in ExtensionArray (#15293)
- GH-25633 - [CI][Java][macOS] Ensure using bundled RE2 (#33711)
- GH-26209 - [Ruby] Add support for Ruby 2.5 (#33602)
- GH-26394 - [Python] Don’t use target_include_directories() for imported target (#33606)
- GH-33626 - [Packaging][RPM] Don’t remove metadata for non-target arch (#33672)
- GH-33638 - [C++] Removing ExecPlan::Make deprecation warning (#33658)
- GH-33643 - [C++] Remove implicit = capture of this which is not valid in c++20 (#33644)
- GH-33666 - [R] Remove extraneous argument to semi_join (#33693)
- GH-33667 - [C++][CI] Use Ubuntu 22.04 for ASAN (#33669)
- GH-33687 - [Dev] Fix commit message generation in merge script (#33691)
- GH-33705 - [R] Fix link on README (#33706)