As an alternative to calling collect() on a Dataset query, you can use this function to access the stream of RecordBatches in the Dataset. This lets you aggregate on each chunk and pull the intermediate results into a data.frame for further aggregation, even if you couldn't fit the whole Dataset result in memory.

map_batches(X, FUN, ..., .data.frame = TRUE)



A Dataset or arrow_dplyr_query object, as returned by the dplyr methods on Dataset.


A function or purrr-style lambda expression to apply to each batch


Additional arguments passed to FUN


logical: collect the resulting chunks into a single data.frame? Default TRUE


This is experimental and not recommended for production use.