As an alternative to calling collect()
on a Dataset
query, you can
use this function to access the stream of RecordBatch
es in the Dataset
.
This lets you aggregate on each chunk and pull the intermediate results into
a data.frame
for further aggregation, even if you couldn't fit the whole
Dataset
result in memory.
map_batches(X, FUN, ..., .data.frame = TRUE)
X | A |
---|---|
FUN | A function or |
... | Additional arguments passed to |
.data.frame | logical: collect the resulting chunks into a single
|
This is experimental and not recommended for production use.