pyarrow.TableGroupBy¶

class pyarrow.TableGroupBy(table, keys)¶

Bases: object

A grouping of columns in a table on which to perform aggregations.

Parameters:

tablepyarrow.Table: Input table to execute the aggregation on.
keysstr or list[str]: Name of the grouped columns.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])

Grouping of columns:

>>> pa.TableGroupBy(t,"keys")
<pyarrow.lib.TableGroupBy object at ...>

Perform aggregations:

>>> pa.TableGroupBy(t,"keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
keys: string
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]

__init__(self, table, keys)¶

Methods

`__init__`(self, table, keys)
`aggregate`(self, aggregations)	Perform an aggregation over the grouped columns of the table.

aggregate(self, aggregations)¶

Perform an aggregation over the grouped columns of the table.

Parameters:

aggregationslist[tuple(str, str)] or list[tuple(str, str, FunctionOptions)]: List of tuples made of aggregation column names followed by function names and optionally aggregation function options. Pass empty list to get a single row for each group.

Returns:

Table: Results of the aggregation functions.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])
>>> t.group_by("keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
keys: string
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]
>>> t.group_by("keys").aggregate([])
pyarrow.Table
keys: string
----
keys: [["a","b","c"]]

pyarrow.Table

pyarrow.RecordBatchReader