pyarrow.parquet.SortingColumn#
- class pyarrow.parquet.SortingColumn(int column_index, bool descending=False, bool nulls_first=False)#
Bases:
object
Sorting specification for a single column.
Returned by
RowGroupMetaData.sorting_columns()
and used inParquetWriter
to specify the sort order of the data.- Parameters:
See also
Notes
Column indices are zero-based, refer only to leaf fields, and are in depth-first order. This may make the column indices for nested schemas different from what you expect. In most cases, it will be easier to specify the sort order using column names instead of column indices and converting using the
from_ordering
method.Examples
In other APIs, sort order is specified by names, such as:
>>> sort_order = [('id', 'ascending'), ('timestamp', 'descending')]
For Parquet, the column index must be used instead:
>>> import pyarrow.parquet as pq >>> [pq.SortingColumn(0), pq.SortingColumn(1, descending=True)] [SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False)]
Convert the sort_order into the list of sorting columns with
from_ordering
(note that the schema must be provided as well):>>> import pyarrow as pa >>> schema = pa.schema([('id', pa.int64()), ('timestamp', pa.timestamp('ms'))]) >>> sorting_columns = pq.SortingColumn.from_ordering(schema, sort_order) >>> sorting_columns (SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False))
Convert back to the sort order with
to_ordering
:>>> pq.SortingColumn.to_ordering(schema, sorting_columns) ((('id', 'ascending'), ('timestamp', 'descending')), 'at_end')
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)from_ordering
(cls, Schema schema, sort_keys)Create a tuple of SortingColumn objects from the same arguments as
pyarrow.compute.SortOptions
.to_dict
(self)Get dictionary representation of the SortingColumn.
to_ordering
(Schema schema, sorting_columns)Convert a tuple of SortingColumn objects to the same format as
pyarrow.compute.SortOptions
.Attributes
"Index of column data is sorted by (int).
Whether column is sorted in descending order (bool).
Whether null values appear before valid values (bool).
- column_index#
“Index of column data is sorted by (int).
- descending#
Whether column is sorted in descending order (bool).
- classmethod from_ordering(cls, Schema schema, sort_keys, null_placement=u'at_end')#
Create a tuple of SortingColumn objects from the same arguments as
pyarrow.compute.SortOptions
.- Parameters:
- schema
Schema
Schema of the input data.
- sort_keys
Sequence
of (name
,order
)tuples
Names of field/column keys (str) to sort the input on, along with the order each field/column is sorted in. Accepted values for order are “ascending”, “descending”.
- null_placement{‘at_start’, ‘at_end’}, default ‘at_end’
Where null values should appear in the sort order.
- schema
- Returns:
- sorting_columns
tuple
ofSortingColumn
- sorting_columns
- nulls_first#
Whether null values appear before valid values (bool).
- to_dict(self)#
Get dictionary representation of the SortingColumn.
- Returns:
dict
Dictionary with a key for each attribute of this class.
- static to_ordering(Schema schema, sorting_columns)#
Convert a tuple of SortingColumn objects to the same format as
pyarrow.compute.SortOptions
.- Parameters:
- schema
Schema
Schema of the input data.
- sorting_columns
tuple
ofSortingColumn
Columns to sort the input on.
- schema
- Returns:
- sort_keys
tuple
of (name
,order
)tuples
- null_placement{‘at_start’, ‘at_end’}
- sort_keys