As an alternative to calling collect() on a Dataset query, you can
use this function to access the stream of RecordBatches in the Dataset.
This lets you do more complex operations in R that operate on chunks of data
without having to hold the entire Dataset in memory at once. You can include
map_batches() in a dplyr pipeline and do additional dplyr methods on the
stream of data in Arrow after it.
Arguments
- X
- A - Datasetor- arrow_dplyr_queryobject, as returned by the- dplyrmethods on- Dataset.
- FUN
- A function or - purrr-style lambda expression to apply to each batch. It must return a RecordBatch or something coercible to one via `as_record_batch()'.
- ...
- Additional arguments passed to - FUN
- .schema
- An optional - schema(). If NULL, the schema will be inferred from the first batch.
- .lazy
- Use - TRUEto evaluate- FUNlazily as batches are read from the result; use- FALSEto evaluate- FUNon all batches before returning the reader.
- .data.frame
- Deprecated argument, ignored