These functions support calling R code from query engine execution
(i.e., a dplyr::mutate() or dplyr::filter() on a Table or Dataset).
Use register_scalar_function() attach Arrow input and output types to an
R function and make it available for use in the dplyr interface and/or
call_function(). Scalar functions are currently the only type of
user-defined function supported. In Arrow, scalar functions must be
stateless and return output with the same shape (i.e., the same number
of rows) as the input.
Arguments
- name
The function name to be used in the dplyr bindings
- fun
An R function or rlang-style lambda expression. The function will be called with a first argument
contextwhich is alist()with elementsbatch_size(the expected length of the output) andoutput_type(the required DataType of the output) that may be used to ensure that the output has the correct type and length. Subsequent arguments are passed by position as specified byin_types. Ifauto_convertisTRUE, subsequent arguments are converted to R vectors before being passed tofunand the output is automatically constructed with the expected output type viaas_arrow_array().- in_type
A DataType of the input type or a
schema()for functions with more than one argument. This signature will be used to determine if this function is appropriate for a given set of arguments. If this function is appropriate for more than one signature, pass alist()of the above.- out_type
A DataType of the output type or a function accepting a single argument (
types), which is alist()of DataTypes. If a function it must return a DataType.- auto_convert
Use
TRUEto convert inputs before passing tofunand construct an Array of the correct type from the output. Use this option to write functions of R objects as opposed to functions of Arrow R6 objects.
Examples
if (FALSE) { # arrow_with_dataset() && identical(Sys.getenv("NOT_CRAN"), "true")
library(dplyr, warn.conflicts = FALSE)
some_model <- lm(mpg ~ disp + cyl, data = mtcars)
register_scalar_function(
"mtcars_predict_mpg",
function(context, disp, cyl) {
predict(some_model, newdata = data.frame(disp, cyl))
},
in_type = schema(disp = float64(), cyl = float64()),
out_type = float64(),
auto_convert = TRUE
)
as_arrow_table(mtcars) |>
transmute(mpg, mpg_predicted = mtcars_predict_mpg(disp, cyl)) |>
collect() |>
head()
}