The arrow package contains methods for 37 dplyr table functions, many of which are "verbs" that do transformations to one or more tables. The package also has mappings of 207 R functions to the corresponding functions in the Arrow compute library. These allow you to write code inside of dplyr methods that call R functions, including many in packages like stringr and lubridate, and they will get translated to Arrow and run on the Arrow query engine (Acero). This document lists all of the mapped functions.

dplyr verbs

Most verb functions return an arrow_dplyr_query object, similar in spirit to a dbplyr::tbl_lazy. This means that the verbs do not eagerly evaluate the query on the data. To run the query, call either compute(), which returns an arrow Table, or collect(), which pulls the resulting Table into an R data.frame.

Function mappings

In the list below, any differences in behavior or support between Acero and the R function are listed. If no notes follow the function name, then you can assume that the function works in Acero just as it does in R.

Functions can be called either as pkg::fun() or just fun(), i.e. both str_sub() and stringr::str_sub() work.

In addition to these functions, you can call any of Arrow's 243 compute functions directly. Arrow has many functions that don't map to an existing R function. In other cases where there is an R function mapping, you can still call the Arrow function directly if you don't want the adaptations that the R mapping has that make Acero behave like R. These functions are listed in the C++ documentation, and in the function registry in R, they are named with an arrow_ prefix, such as arrow_ascii_is_decimal.

base

lubridate

methods

stats

  • median(): approximate median (t-digest) is computed

  • quantile(): probs must be length 1; approximate quantile (t-digest) is computed

  • sd()

  • var()

stringi

stringr

Pattern modifiers coll() and boundary() are not supported in any functions.

tibble