As an alternative to calling collect() on a Dataset query, you can
use this function to access the stream of RecordBatches in the Dataset.
This lets you aggregate on each chunk and pull the intermediate results into
a data.frame for further aggregation, even if you couldn't fit the whole
Dataset result in memory.
map_batches(X, FUN, ..., .data.frame = TRUE)A Dataset or arrow_dplyr_query object, as returned by the
dplyr methods on Dataset.
A function or purrr-style lambda expression to apply to each
batch
Additional arguments passed to FUN
logical: collect the resulting chunks into a single
data.frame? Default TRUE
This is experimental and not recommended for production use.