As an alternative to calling collect() on a Dataset query, you can
use this function to access the stream of RecordBatches in the Dataset.
This lets you do more complex operations in R that operate on chunks of data
without having to hold the entire Dataset in memory at once. You can include
map_batches() in a dplyr pipeline and do additional dplyr methods on the
stream of data in Arrow after it.
Arguments
- X
A
Datasetorarrow_dplyr_queryobject, as returned by thedplyrmethods onDataset.- FUN
A function or
purrr-style lambda expression to apply to each batch. It must return a RecordBatch or something coercible to one via `as_record_batch()'.- ...
Additional arguments passed to
FUN- .schema
An optional
schema(). If NULL, the schema will be inferred from the first batch.- .lazy
Use
TRUEto evaluateFUNlazily as batches are read from the result; useFALSEto evaluateFUNon all batches before returning the reader.- .data.frame
Deprecated argument, ignored