Common developer workflow tasks

The arrow/r directory contains a Makefile to help with some common tasks from the command line (e.g. make test, make doc, make clean, etc.).

Loading arrow

You can load the R package via devtools::load_all().

Rebuilding the documentation

The R documentation uses the @examplesIf tag introduced in roxygen2 version 7.1.2.

remotes::install_github("r-lib/roxygen2")

You can use devtools::document() and pkgdown::build_site() to rebuild the documentation and preview the results.

# Update roxygen documentation
devtools::document()

# To preview the documentation website
pkgdown::build_site(preview=TRUE)

Styling and linting

R code

The R code in the package follows the tidyverse style. On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.

To run the lintr locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file ci/docker/linux-apt-lint.dockerfile for the current status) and then run

lintr::lint_package("arrow/r")

You can automatically change the formatting of the code in the package using the styler package. There are two ways to do this:

  1. Use the comment bot to do this automatically with the command @github-actions autotune on a PR, and commit it back to the branch.

  2. Run the styler locally either via Makefile commands:

or in R:

# note the two excluded files which should not be styled
styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))

The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in r/.styler_excludes.R.

C++ code

The arrow package uses some customized tools on top of cpp11 to prepare its C++ code in src/. This is because there are some features that are only enabled and built conditionally during build time. If you change C++ code in the R package, you will need to set the ARROW_R_DEV environment variable to true (optionally, add it to your ~/.Renviron file to persist across sessions) so that the data-raw/codegen.R file is used for code generation. The Makefile commands also handles this automatically.

We use Google C++ style in our C++ code. The easiest way to accomplish this is use an editors/IDE that formats your code for you. Many popular editors/IDEs have support for running clang-format on C++ files when you save them. Installing/enabling the appropriate plugin may save you much frustration.

Check for style errors with

Fix any style issues before committing with

The lint script requires Python 3 and clang-format. If the command isn’t found, you can explicitly provide the path to it like:

You can see what version of clang-format is required by the following command:

Note that the lint script requires Python 3 and the Python dependencies (note that `cmake_format is pinned to a specific version):

  • autopep8
  • flake8
  • cmake_format==0.5.2

Running tests

Tests can be run either using devtools::test() or the Makefile alternative.

Some tests are conditionally enabled based on the availability of certain features in the package build (S3 support, compression libraries, etc.). Others are generally skipped by default but can be enabled with environment variables or other settings:

  • All tests are skipped on Linux if the package builds without the C++ libarrow. To make the build fail if libarrow is not available (as in, to test that the C++ build was successful), set TEST_R_WITH_ARROW=true

  • Some tests are disabled unless ARROW_R_DEV=true

  • Tests that require allocating >2GB of memory to test Large types are disabled unless ARROW_LARGE_MEMORY_TESTS=true

  • Integration tests against a real S3 bucket are disabled unless credentials are set in AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY; these are available on request

  • S3 tests using MinIO locally are enabled if the minio server process is found running. If you’re running MinIO with custom settings, you can set MINIO_ACCESS_KEY, MINIO_SECRET_KEY, and MINIO_PORT to override the defaults.

Running checks

You can run package checks by using devtools::check() and check test coverage with covr::package_coverage().

# All package checks
devtools::check()

# See test coverage statistics
covr::report()
covr::package_coverage()

For full package validation, you can run the following commands from a terminal.

R CMD build .
R CMD check arrow_*.tar.gz --as-cran

Running additional CI checks

On a pull request, there are some actions you can trigger by commenting on the PR. We have additional CI checks that run nightly and can be requested on demand using an internal tool called crossbow. A few important GitHub comment commands are shown below.

Run all extended R CI tasks
@github-actions crossbow submit -g r

This runs each of the R-related CI tasks.

Run a specific task
@github-actions crossbow submit {task-name}

See the r: group definition near the beginning of the crossbow configuration for a list of glob expression patterns that match names of items in the tasks: list below it.

Run linting and documentation building tasks
@github-actions autotune

This will run and fix lint C++ linting errors, run R documentation (among other cleanup tasks), run styler on any changed R code, and commit the resulting updates to the branch.