These functions support calling R code from query engine execution
(i.e., a dplyr::mutate()
or dplyr::filter()
on a Table or Dataset).
Use register_scalar_function()
attach Arrow input and output types to an
R function and make it available for use in the dplyr interface and/or
call_function()
. Scalar functions are currently the only type of
user-defined function supported. In Arrow, scalar functions must be
stateless and return output with the same shape (i.e., the same number
of rows) as the input.
register_scalar_function(name, fun, in_type, out_type, auto_convert = FALSE)
The function name to be used in the dplyr bindings
An R function or rlang-style lambda expression. The function
will be called with a first argument context
which is a list()
with elements batch_size
(the expected length of the output) and
output_type
(the required DataType of the output) that may be used
to ensure that the output has the correct type and length. Subsequent
arguments are passed by position as specified by in_types
. If
auto_convert
is TRUE
, subsequent arguments are converted to
R vectors before being passed to fun
and the output is automatically
constructed with the expected output type via as_arrow_array()
.
A DataType of the input type or a schema()
for functions with more than one argument. This signature will be used
to determine if this function is appropriate for a given set of arguments.
If this function is appropriate for more than one signature, pass a
list()
of the above.
A DataType of the output type or a function accepting
a single argument (types
), which is a list()
of DataTypes. If a
function it must return a DataType.
Use TRUE
to convert inputs before passing to fun
and construct an Array of the correct type from the output. Use this
option to write functions of R objects as opposed to functions of
Arrow R6 objects.
NULL
, invisibly
library(dplyr, warn.conflicts = FALSE)
some_model <- lm(mpg ~ disp + cyl, data = mtcars)
register_scalar_function(
"mtcars_predict_mpg",
function(context, disp, cyl) {
predict(some_model, newdata = data.frame(disp, cyl))
},
in_type = schema(disp = float64(), cyl = float64()),
out_type = float64(),
auto_convert = TRUE
)
as_arrow_table(mtcars) %>%
transmute(mpg, mpg_predicted = mtcars_predict_mpg(disp, cyl)) %>%
collect() %>%
head()
#> mpg mpg_predicted
#> 1 21.0 21.84395
#> 2 21.0 21.84395
#> 3 22.8 26.08886
#> 4 21.4 19.82676
#> 5 18.7 14.55267
#> 6 18.1 20.50602