As an alternative to calling collect()
on a Dataset
query, you can
use this function to access the stream of RecordBatch
es in the Dataset
.
This lets you do more complex operations in R that operate on chunks of data
without having to hold the entire Dataset in memory at once. You can include
map_batches()
in a dplyr pipeline and do additional dplyr methods on the
stream of data in Arrow after it.
Arguments
- X
A
Dataset
orarrow_dplyr_query
object, as returned by thedplyr
methods onDataset
.- FUN
A function or
purrr
-style lambda expression to apply to each batch. It must return a RecordBatch or something coercible to one via `as_record_batch()'.- ...
Additional arguments passed to
FUN
- .schema
An optional
schema()
. If NULL, the schema will be inferred from the first batch.- .lazy
Use
TRUE
to evaluateFUN
lazily as batches are read from the result; useFALSE
to evaluateFUN
on all batches before returning the reader.- .data.frame
Deprecated argument, ignored