As an alternative to calling
collect() on a
Dataset query, you can
use this function to access the stream of
RecordBatches in the
This lets you aggregate on each chunk and pull the intermediate results into
data.frame for further aggregation, even if you couldn't fit the whole
Dataset result in memory.
map_batches(X, FUN, ..., .data.frame = TRUE)
A function or
Additional arguments passed to
logical: collect the resulting chunks into a single
This is experimental and not recommended for production use.