pyarrow.TableGroupBy

class pyarrow.TableGroupBy(table, keys)

Bases: object

A grouping of columns in a table on which to perform aggregations.

Parameters:
tablepyarrow.Table

Input table to execute the aggregation on.

keysstr or list[str]

Name of the grouped columns.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])

Grouping of columns:

>>> pa.TableGroupBy(t,"keys")
<pyarrow.lib.TableGroupBy object at ...>

Perform aggregations:

>>> pa.TableGroupBy(t,"keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
keys: string
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]
__init__(self, table, keys)

Methods

__init__(self, table, keys)

aggregate(self, aggregations)

Perform an aggregation over the grouped columns of the table.

aggregate(self, aggregations)

Perform an aggregation over the grouped columns of the table.

Parameters:
aggregationslist[tuple(str, str)] or list[tuple(str, str, FunctionOptions)]

List of tuples made of aggregation column names followed by function names and optionally aggregation function options. Pass empty list to get a single row for each group.

Returns:
Table

Results of the aggregation functions.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])
>>> t.group_by("keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
keys: string
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]
>>> t.group_by("keys").aggregate([])
pyarrow.Table
keys: string
----
keys: [["a","b","c"]]