pyarrow.acero.AggregateNodeOptions

class pyarrow.acero.AggregateNodeOptions(aggregates, keys=None)

Bases: _AggregateNodeOptions

Make a node which aggregates input batches, optionally grouped by keys.

This is the option class for the “aggregate” node factory.

Acero supports two types of aggregates: “scalar” aggregates, and “hash” aggregates. Scalar aggregates reduce an array or scalar input to a single scalar output (e.g. computing the mean of a column). Hash aggregates act like GROUP BY in SQL and first partition data based on one or more key columns, then reduce the data in each partition. The aggregate node supports both types of computation, and can compute any number of aggregations at once.

Parameters:
aggregateslist of tuples

Aggregations which will be applied to the targetted fields. Specified as a list of tuples, where each tuple is one aggregation specification and consists of: aggregation target column(s) followed by function name, aggregation function options object and the output field name. The target column(s) specification can be a single field reference, an empty list or a list of fields unary, nullary and n-ary aggregation functions respectively. Each field reference can be a string column name or expression.

keyslist of field references, optional

Keys by which aggregations will be grouped. Each key can reference a field using a string name or expression.

__init__(self, aggregates, keys=None)

Methods

__init__(self, aggregates[, keys])