This document explains which C++ features are enabled in different Arrow R package build configurations, and documents the decisions behind our default feature set. This is intended as internal developer documentation for understanding which features are enabled in which builds. It is not intended to be a guide for installing the Arrow R package; for that, see the installation guide.
Overview
When the Arrow R package is installed, it needs a copy of the Arrow C++ library (libarrow). This can come from:
- Prebuilt binaries we host (for releases and nightlies)
- Source builds when binaries aren’t available or users opt out
The features available in libarrow depend on how it was built. This document covers the feature configuration for both scenarios.
Prebuilt libarrow binary configuration
We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These binaries include more features than the default source build to provide users with a fully-featured experience out of the box.
Current binary feature set
| Platform | S3 | GCS | Configured in |
|---|---|---|---|
| macOS (ARM64, x86_64) | ON | ON | dev/tasks/r/github.packages.yml |
| Windows | ON | ON | ci/scripts/PKGBUILD |
| Linux (x86_64) | ON | ON |
compose.yaml (ubuntu-cpp-static) |
Exceptions to our build defaults
Even though GCS defaults to OFF for source builds, we explicitly enable it in our prebuilt binaries because:
- Binary users expect features to “just work” - they shouldn’t need to rebuild from source to access cloud storage
- Build time is not a concern - we build binaries once in CI, not on user machines
- Parity across platforms - users get the same features regardless of OS
Feature configuration in source builds of libarrow
Source builds are controlled by
r/inst/build_arrow_static.sh. The key environment variable
is LIBARROW_MINIMAL:
-
LIBARROW_MINIMALunset: Default feature set (Parquet, Dataset, JSON, common compression ON; S3/GCS/jemalloc OFF) -
LIBARROW_MINIMAL=false: Full feature set (adds S3, jemalloc, additional compression) -
LIBARROW_MINIMAL=true: Truly minimal (disables Parquet, Dataset, JSON, most compression, SIMD)
Features always enabled
These features are always built regardless of
LIBARROW_MINIMAL:
| Feature | CMake Flag | Notes |
|---|---|---|
| Compute | ARROW_COMPUTE=ON |
Core compute functions |
| CSV | ARROW_CSV=ON |
CSV reading/writing |
| Filesystem | ARROW_FILESYSTEM=ON |
Local filesystem support |
| JSON | ARROW_JSON=ON |
JSON reading |
| Parquet | ARROW_PARQUET=ON |
Parquet file format |
| Dataset | ARROW_DATASET=ON |
Multi-file datasets |
| Acero | ARROW_ACERO=ON |
Query execution engine |
| Mimalloc | ARROW_MIMALLOC=ON |
Memory allocator |
| LZ4 | ARROW_WITH_LZ4=ON |
LZ4 compression |
| Snappy | ARROW_WITH_SNAPPY=ON |
Snappy compression |
| RE2 | ARROW_WITH_RE2=ON |
Regular expressions |
| UTF8Proc | ARROW_WITH_UTF8PROC=ON |
Unicode support |
Features controlled by LIBARROW_MINIMAL
When LIBARROW_MINIMAL=false, the following additional
features are enabled (via $ARROW_DEFAULT_PARAM=ON):
| Feature | CMake Flag | Default |
|---|---|---|
| S3 | ARROW_S3 |
$ARROW_DEFAULT_PARAM |
| Jemalloc | ARROW_JEMALLOC |
$ARROW_DEFAULT_PARAM |
| Brotli | ARROW_WITH_BROTLI |
$ARROW_DEFAULT_PARAM |
| BZ2 | ARROW_WITH_BZ2 |
$ARROW_DEFAULT_PARAM |
| Zlib | ARROW_WITH_ZLIB |
$ARROW_DEFAULT_PARAM |
| Zstd | ARROW_WITH_ZSTD |
$ARROW_DEFAULT_PARAM |
Features that require explicit opt-in
GCS (Google Cloud Storage) is always off by default,
even when LIBARROW_MINIMAL=false:
| Feature | CMake Flag | Default | Reason |
|---|---|---|---|
| GCS | ARROW_GCS |
OFF |
Build complexity, dependency size |
To enable GCS in a source build, you must explicitly set
ARROW_GCS=ON.
Why is GCS off by default?
GCS was turned off by default in #48343 (December 2025) because:
- Building google-cloud-cpp is fragile and adds significant build time
- The dependency on abseil (ABSL) has caused compatibility issues
- Users who need GCS can still enable it explicitly
Configuration file locations
libarrow source build configuration
The main build script that controls source builds:
r/inst/build_arrow_static.sh - CMake
flags and defaults (view
source) the environment variables to look for are
LIBARROW_MINIMAL, ARROW_*, and,
ARROW_DEFAULT_PARAM
libarrow binary build configuration
Each platform has its own configuration file:
| Platform | Config file | Key settings |
|---|---|---|
| macOS | dev/tasks/r/github.packages.yml |
LIBARROW_MINIMAL=false, ARROW_GCS=ON
|
| Windows | ci/scripts/PKGBUILD |
ARROW_GCS=ON, ARROW_S3=ON
|
| Linux |
compose.yaml (ubuntu-cpp-static) |
LIBARROW_MINIMAL=false, ARROW_GCS=ON
|
R-universe builds
R-universe builds the Arrow R package for users who want newer versions than CRAN. R-universe behavior varies by platform and architecture:
| Platform | Architecture | Build method | Features |
|---|---|---|---|
| macOS | ARM64 | Downloads prebuilt binary | Full (S3 + GCS) |
| macOS | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
| Windows | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
| Windows | ARM64 | Not supported | NA |
| Linux | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
| Linux | ARM64 | Builds from source | S3 only (no GCS) |
Why Linux ARM64 builds from source
We only publish prebuilt Linux binaries for x86_64 architecture. The
binary selection logic in r/tools/nixlibs.R (line 263)
explicitly checks for this:
When R-universe builds on Linux ARM64 runners, no binary is
available, so it falls back to building from source using
build_arrow_static.sh. Since GCS defaults to OFF in that
script, Linux ARM64 users don’t get GCS support.
Enabling GCS for Linux ARM64
To provide full feature parity for Linux ARM64, we would need to:
- Add an ARM64 Linux build job to
dev/tasks/r/github.packages.yml - Update
select_binary()innixlibs.Rto recognizelinux-aarch64 - Add the artifact pattern to
dev/tasks/tasks.yml - Update the nightly upload workflow
See GH-36193 for tracking this work.
Alternatively, changing the GCS default in
build_arrow_static.sh from OFF to
$ARROW_DEFAULT_PARAM would enable GCS for all source
builds, including Linux ARM64 on R-universe.
Checking installed features
Users can check which features are enabled in their installation:
# Show all capabilities
arrow::arrow_info()
# Check specific features
arrow::arrow_with_s3()
arrow::arrow_with_gcs()Related documentation
- Installation guide - User-facing installation docs
- Installation details - How the build system works
- Developer setup - Building Arrow for development