vignettes/developers/workflow.Rmd
workflow.Rmd
The Arrow R package uses several additional development tools:
lintr
for code analysis
jonkeane/lintr@arrow-branch
styler
for code stylingpkgdown
for building the websiteroxygen2
for documenting the package
@examplesIf
tag introduced in roxygen2
version 7.1.2You can install all these additional dependencies by running:
remotes::install_github("jonkeane/lintr@arrow-branch")
install.packages(c("styler", "pkgdown", "roxygen2"))
The arrow/r
directory contains a Makefile
to help with some common tasks from the command line (e.g. make test
, make doc
, make clean
, etc.).
The R documentation uses the @examplesIf
tag introduced in roxygen2 version 7.1.2.
remotes::install_github("r-lib/roxygen2")
You can use devtools::document()
and pkgdown::build_site()
to rebuild the documentation and preview the results.
# Update roxygen documentation
devtools::document()
# To preview the documentation website
pkgdown::build_site(preview=TRUE)
The R code in the package follows the tidyverse style. On PR submission (and on pushes) our CI will run linting and will flag possible errors on the pull request with annotations.
To run the linter locally, install the lintr package (note, we currently use a fork that includes fixes not yet accepted upstream, see how lintr is being installed in the file ci/docker/linux-apt-lint.dockerfile
for the current status) and then run
lintr::lint_package("arrow/r")
You can automatically change the formatting of the code in the package using the styler package. There are two ways to do this:
Use the comment bot to do this automatically with the command @github-actions autotune
on a PR, and commit it back to the branch.
Run the styler locally either via Makefile commands:
or in R:
# note the two excluded files which should not be styled
styler::style_pkg(exclude_files = c("tests/testthat/latin1.R", "data-raw/codegen.R"))
The styler package will fix many styling errors, thought not all lintr errors are automatically fixable with styler. The list of files we intentionally do not style is in r/.styler_excludes.R
.
The arrow package uses some customized tools on top of cpp11 to prepare its C++ code in src/
. This is because there are some features that are only enabled and built conditionally during build time. If you change C++ code in the R package, you will need to set the ARROW_R_DEV
environment variable to true
(optionally, add it to your ~/.Renviron
file to persist across sessions) so that the data-raw/codegen.R
file is used for code generation. The Makefile
commands also handles this automatically.
We use Google C++ style in our C++ code. The easiest way to accomplish this is use an editors/IDE that formats your code for you. Many popular editors/IDEs have support for running clang-format
on C++ files when you save them. Installing/enabling the appropriate plugin may save you much frustration.
Check for style errors with
Fix any style issues before committing with
The lint script requires Python 3 and clang-format
. If the command isn’t found, you can explicitly provide the path to it like:
You can see what version of clang-format
is required by the following command:
Note that the lint script requires Python 3 and the Python dependencies (note that `cmake_format is pinned to a specific version):
Tests can be run either using devtools::test()
or the Makefile alternative.
# Run the test suite, optionally filtering file names
devtools::test(filter="^regexp$")
Some tests are conditionally enabled based on the availability of certain features in the package build (S3 support, compression libraries, etc.). Others are generally skipped by default but can be enabled with environment variables or other settings:
All tests are skipped on Linux if the package builds without the C++ libarrow. To make the build fail if libarrow is not available (as in, to test that the C++ build was successful), set TEST_R_WITH_ARROW=true
Some tests are disabled unless ARROW_R_DEV=true
Tests that require allocating >2GB of memory to test Large types are disabled unless ARROW_LARGE_MEMORY_TESTS=true
Integration tests against a real S3 bucket are disabled unless credentials are set in AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
; these are available on request
S3 tests using MinIO locally are enabled if the minio server
process is found running. If you’re running MinIO with custom settings, you can set MINIO_ACCESS_KEY
, MINIO_SECRET_KEY
, and MINIO_PORT
to override the defaults.
You can run package checks by using devtools::check()
and check test coverage with covr::package_coverage()
.
# All package checks
devtools::check()
# See test coverage statistics
covr::report()
covr::package_coverage()
For full package validation, you can run the following commands from a terminal.
On a pull request, there are some actions you can trigger by commenting on the PR. These extended CI checks are run nightly and can also be requested on-demand using an internal tool called crossbow. A few important GitHub comment commands are shown below.
@github-actions crossbow submit -g r
This runs each of the R-related CI tasks.
@github-actions crossbow submit {task-name}
See the r:
group definition near the beginning of the crossbow configuration for a list of glob expression patterns that match names of items in the tasks:
list below it.