In most cases, install.packages("arrow") should just work. There are things you can do to make the installation faster, documented in this article. If for some reason installation does not work, set the environment variable ARROW_R_DEV=true, retry, and share the logs with us.

## Background

The Apache Arrow project is implemented in multiple languages, and the R package depends on the Arrow C++ library (referred to from here on as libarrow). This means that when you install arrow, you need both the R and C++ versions. If you install arrow from CRAN on a machine running Windows or MacOS, when you call install.packages("arrow"), a precompiled binary containing both the R package and libarrow will be downloaded. However, CRAN does not host R package binaries for Linux, and so you must choose from one of the alternative approaches.

This article outlines the recommend approaches to installing arrow on Linux, starting from the simplest and least customizable to the most complex but with more flexbility to customize your installation.

The primary audience for this document is arrow R package users on Linux, and not Arrow developers. Additional resources for developers are listed at the end of this article.

## System dependencies

The arrow package is designed to work with very minimal system requirements, but there are a few things to note.

### Compilers

As of version 10.0.0, arrow requires a C++17 compiler to build. For gcc, this generally means version 7 or newer. Most contemporary Linux distributions have a new enough compiler; however, CentOS 7 is a notable exception, as it ships with gcc 4.8.

If you are on CentOS 7, to build arrow you will need to install a newer devtoolset, and you’ll need to update R’s Makevars to define the CXX17 variables. This script installs devtoolset-8 and configures R to be able to use C++17:

#!/usr/bin/env bash

yum install -y centos-release-scl
yum install -y devtoolset-8
# Optional: also install cloud storage dependencies, as described below
yum install -y libcurl-devel openssl-devel

source /opt/rh/devtoolset-8/enable

if [ ! R CMD config CXX17 ]; then
mkdir -p ~/.R
echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars echo "CXX17 =$(which g++) -fPIC" >> ~/.R/Makevars
echo "CXX17STD = -std=c++17" >> ~/.R/Makevars

## Using install_arrow()

The previous instructions are useful for a fresh arrow installation, but arrow provides the function install_arrow(). There are three common use cases for this function:

• You have arrow installed and want to upgrade to a different version
• You want to try to reinstall and fix issues with Linux C++ binaries
• You want to install a development build

Examples of using install_arrow() are shown below:

install_arrow()               # latest release
install_arrow(nightly = TRUE) # install development version
install_arrow(verbose = TRUE) # verbose output to debug install errors

Although this function is part of the arrow package, it is also available as a standalone script, so you can access it without first installing the package:

source("https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R")

Notes:

• install_arrow() does not require environment variables to be set in order to satisfy C++ dependencies.
• unlike packages like tensorflow, blogdown, and others that require external dependencies, you do not need to run install_arrow() after a successful arrow installation.

## Offline installation

The install-arrow.R file mentioned in the previous section includes a function called create_package_with_all_dependencies(). Normally, when installing on a computer with internet access, the build process will download third-party dependencies as needed. This function provides a way to download them in advance, which can be useful when installing Arrow on a computer without internet access. The process is as follows:

• Install the arrow package or source the script directly using the following command:

source("https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R")
• Use the create_package_with_all_dependencies() function to create the installation bundle:

create_package_with_all_dependencies("my_arrow_pkg.tar.gz")
• Copy the newly created my_arrow_pkg.tar.gz file to the computer without internet access

Step 2. On the computer without internet access, install the prepared package:

• Install the arrow package from the copied file:

install.packages(
"my_arrow_pkg.tar.gz",
)

This installation will build from source, so cmake must be available

• Run arrow_info() to check installed capabilities

Notes:

• arrow can be installed on a computer without internet access without using this function, but many useful features will be disabled, as they depend on third-party components. More precisely, arrow::arrow_info()\$capabilities() will be FALSE for every capability.

• If you are using binary packages you shouldn’t need to this function. You can download the appropriate binary from your package repository, transfer that to the offline computer, and install that.

• If you’re using RStudio Package Manager on Linux (RSPM), and you want to make a source bundle with this function, make sure to set the first repository in options("repos") to be a mirror that contains source packages. That is, the repository needs to be something other than the RSPM binary mirror URLs.

## Offline installation (alternative)

A second method for offline installation is a little more hands-on. Follow these steps if you wish to try it:

• Download the dependency files (cpp/thirdparty/download_dependencies.sh may be helpful)
• Copy the directory of dependencies to the offline computer
• Create the environment variable ARROW_THIRDPARTY_DEPENDENCY_DIR on the offline computer, pointing to the copied directory.
• Install the arrow package as usual.

For offline installation using libarrow binaries, see Method 1b above.

## Troubleshooting

The intent is that install.packages("arrow") will just work and handle all C++ dependencies, but depending on your system, you may have better results if you tune one of several parameters. Here are some known complications and ways to address them.

### Package failed to build C++ dependencies

If you see a message like

------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------

in the output when the package fails to install, that means that installation failed to retrieve or build the libarrow version compatible with the current version of the R package.

Please check the “Known installation issues” below to see if any apply, and if none apply, set the environment variable ARROW_R_DEV=TRUE for more verbose output and try installing again. Then, please report an issue and include the full installation output.

### Using system libraries

If a system library or other installed Arrow is found but it doesn’t match the R package version (for example, you have libarrow 1.0.0 on your system and are installing R package 2.0.0), it is likely that the R bindings will fail to compile. Because the Apache Arrow project is under active development, it is essential that versions of libarrow and the R package matches. When install.packages("arrow") has to download libarrow, the install script ensures that you fetch the libarrow version that corresponds to your R package version. However, if you are using a version of libarrow already on your system, version match isn’t guaranteed.

To fix version mismatch, you can either update your libarrow system packages to match the R package version, or set the environment variable ARROW_USE_PKG_CONFIG=FALSE to tell the configure script not to look for system version of libarrow. (The latter is the default of install_arrow().) System libarrow versions are available corresponding to all CRAN releases but not for nightly or dev versions, so depending on the R package version you’re installing, system libarrow version may not be an option.

Note also that once you have a working R package installation based on system (shared) libraries, if you update your system libarrow installation, you’ll need to reinstall the R package to match its version. Similarly, if you’re using libarrow system libraries, running update.packages() after a new release of the arrow package will likely fail unless you first update the libarrow system packages.

### Using prebuilt binaries

If the R package finds and downloads a prebuilt binary of libarrow, but then the arrow package can’t be loaded, perhaps with “undefined symbols” errors, please report an issue. This is likely a compiler mismatch and may be resolvable by setting some environment variables to instruct R to compile the packages to match libarrow.

A workaround would be to set the environment variable LIBARROW_BINARY=FALSE and retry installation: this value instructs the package to build libarrow from source instead of downloading the prebuilt binary. That should guarantee that the compiler settings match.

If a prebuilt libarrow binary wasn’t found for your operating system but you think it should have been, please report an issue and share the console output. You may also set the environment variable ARROW_R_DEV=TRUE for additional debug messages.

### Building libarrow from source

If building libarrow from source fails, check the error message. (If you don’t see an error message, only the ----- NOTE -----, set the environment variable ARROW_R_DEV=TRUE to increase verbosity and retry installation.) The install script should work everywhere, so if libarrow fails to compile, please report an issue so that we can improve the script.

### Known installation issues

• On CentOS, building the package requires a more modern devtoolset than the default system compilers. See “System dependencies” above.

• If you have multiple versions of zstd installed on your system, installation by building libarrow from source may fail with an “undefined symbols” error. Workarounds include (1) setting LIBARROW_BINARY to use a C++ binary; (2) setting ARROW_WITH_ZSTD=OFF to build without zstd; or (3) uninstalling the conflicting zstd. See discussion here.

## Contributing

We are constantly working to make the installation process as painless as possible. If you find ways to improve the process, please report an issue so that we can document it. Similarly, if you find that your Linux distribution or version is not supported, we would welcome the contribution of Docker images (hosted on Docker Hub) that we can use in our continuous integration and hopefully improve our coverage. If you do contribute a Docker image, it should be as minimal as possible, containing only R and the dependencies it requires. For reference, see the images that R-hub uses.

You can test the arrow R package installation using the docker-compose setup included in the apache/arrow git repository. For example,

R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose build r
R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose run r

installs the arrow R package, including libarrow, on the rhub/ubuntu-gcc-release image.