The Arrow R package uses several additional development tools:
-
lintr
for code analysis -
styler
for code styling -
pkgdown
for building the website -
roxygen2
for documenting the package- the R documentation uses the
@examplesIf
tag introduced inroxygen2
version 7.1.2
- the R documentation uses the
You can install all these additional dependencies by running:
install.packages(c("lintr", "styler", "pkgdown", "roxygen2"))
The arrow/r
directory contains a Makefile
to help with some common tasks from the command line (e.g. make test
, make doc
, make clean
, etc.).
Rebuilding the documentation
The R documentation uses the @examplesIf
tag introduced in roxygen2 version 7.1.2.
remotes::install_github("r-lib/roxygen2")
You can use devtools::document()
and pkgdown::build_site()
to rebuild the documentation and preview the results.
# Update roxygen documentation
devtools::document()
# To preview the documentation website
pkgdown::build_site(preview=TRUE)
Styling and linting
R code
The R code in the package follows the tidyverse style. On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
To run the linter locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file ci/docker/linux-apt-lint.dockerfile
for the current status) and then run
lintr::lint_package("arrow/r")
You can automatically change the formatting of the code in the package using the styler package. There are two ways to do this:
Use the comment bot to do this automatically with the command
@github-actions autotune
on a PR, and commit it back to the branch.Run the styler locally either via Makefile commands:
make style # (for only the files changed)
make style-all # (for all files)
or in R:
# note the file that should not be styled
styler::style_pkg(exclude_files = c("data-raw/codegen.R"))
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in r/.styler_excludes.R
.
C++ code
The arrow package uses some customized tools on top of cpp11 to prepare its C++ code in src/
. This is because there are some features that are only enabled and built conditionally during build time. If you change C++ code in the R package, you will need to set the ARROW_R_DEV
environment variable to true
(optionally, add it to your ~/.Renviron
file to persist across sessions) so that the data-raw/codegen.R
file is used for code generation. The Makefile
commands also handles this automatically.
We use Google C++ style in our C++ code. The easiest way to accomplish this is use an editors/IDE that formats your code for you. Many popular editors/IDEs have support for running clang-format
on C++ files when you save them. Installing/enabling the appropriate plugin may save you much frustration.
Check for style errors with
./lint.sh
Fix any style issues before committing with
./lint.sh --fix
The lint script requires Python 3 and clang-format
. If the command isn’t found, you can explicitly provide the path to it like:
CLANG_FORMAT=/opt/llvm/bin/clang-format ./lint.sh
You can see what version of clang-format
is required by the following command:
(. ../.env && echo ${CLANG_TOOLS})
Note that the lint script requires Python 3 and the Python dependencies (note that `cmake_format is pinned to a specific version):
- autopep8
- flake8
- cmake_format==0.5.2
Running tests
Tests can be run either using devtools::test()
or the Makefile alternative.
# Run the test suite, optionally filtering file names
devtools::test(filter="^regexp$")
# or the Makefile alternative from the arrow/r directory in a shell:
make test file=regexp
Some tests are conditionally enabled based on the availability of certain features in the package build (S3 support, compression libraries, etc.). Others are generally skipped by default but can be enabled with environment variables or other settings:
All tests are skipped on Linux if the package builds without the C++ libarrow. To make the build fail if libarrow is not available (as in, to test that the C++ build was successful), set
TEST_R_WITH_ARROW=true
Some tests are disabled unless
ARROW_R_DEV=true
Tests that require allocating >2GB of memory to test Large types are disabled unless
ARROW_LARGE_MEMORY_TESTS=true
Integration tests against a real S3 bucket are disabled unless credentials are set in
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
; these are available on requestS3 tests using MinIO locally are enabled if the
minio server
process is found running. If you’re running MinIO with custom settings, you can setMINIO_ACCESS_KEY
,MINIO_SECRET_KEY
, andMINIO_PORT
to override the defaults.
Running checks
You can run package checks by using devtools::check()
and check test coverage with covr::package_coverage()
.
# All package checks
devtools::check()
# See test coverage statistics
covr::report()
covr::package_coverage()
For full package validation, you can run the following commands from a terminal.
R CMD build .
R CMD check arrow_*.tar.gz --as-cran
Running extended CI checks
On a pull request, there are some actions you can trigger by commenting on the PR. These extended CI checks are run nightly and can also be requested on-demand using an internal tool called crossbow. A few important GitHub comment commands are shown below.
Run all extended R CI tasks
@github-actions crossbow submit -g r
This runs each of the R-related CI tasks.
Run a specific task
@github-actions crossbow submit {task-name}
See the r:
group definition near the beginning of the crossbow configuration for a list of glob expression patterns that match names of items in the tasks:
list below it.