These functions support calling R code from query engine execution
(i.e., a dplyr::mutate()
or dplyr::filter()
on a Table or Dataset).
Use register_scalar_function()
attach Arrow input and output types to an
R function and make it available for use in the dplyr interface and/or
call_function()
. Scalar functions are currently the only type of
user-defined function supported. In Arrow, scalar functions must be
stateless and return output with the same shape (i.e., the same number
of rows) as the input.
Arguments
- name
The function name to be used in the dplyr bindings
- fun
An R function or rlang-style lambda expression. The function will be called with a first argument
context
which is alist()
with elementsbatch_size
(the expected length of the output) andoutput_type
(the required DataType of the output) that may be used to ensure that the output has the correct type and length. Subsequent arguments are passed by position as specified byin_types
. Ifauto_convert
isTRUE
, subsequent arguments are converted to R vectors before being passed tofun
and the output is automatically constructed with the expected output type viaas_arrow_array()
.- in_type
A DataType of the input type or a
schema()
for functions with more than one argument. This signature will be used to determine if this function is appropriate for a given set of arguments. If this function is appropriate for more than one signature, pass alist()
of the above.- out_type
A DataType of the output type or a function accepting a single argument (
types
), which is alist()
of DataTypes. If a function it must return a DataType.- auto_convert
Use
TRUE
to convert inputs before passing tofun
and construct an Array of the correct type from the output. Use this option to write functions of R objects as opposed to functions of Arrow R6 objects.
Examples
if (FALSE) { # arrow_with_dataset() && identical(Sys.getenv("NOT_CRAN"), "true")
library(dplyr, warn.conflicts = FALSE)
some_model <- lm(mpg ~ disp + cyl, data = mtcars)
register_scalar_function(
"mtcars_predict_mpg",
function(context, disp, cyl) {
predict(some_model, newdata = data.frame(disp, cyl))
},
in_type = schema(disp = float64(), cyl = float64()),
out_type = float64(),
auto_convert = TRUE
)
as_arrow_table(mtcars) %>%
transmute(mpg, mpg_predicted = mtcars_predict_mpg(disp, cyl)) %>%
collect() %>%
head()
}