pyarrow.parquet.Statistics

class pyarrow.parquet.Statistics

Bases: _Weakrefable

Statistics for a single column in a single row group.

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

equals(self, Statistics other)

Return whether the two column statistics objects are equal.

to_dict(self)

Get dictionary represenation of statistics.

Attributes

converted_type

Legacy converted type (str or None).

distinct_count

Distinct number of values in chunk (int).

has_distinct_count

Whether distinct count is preset (bool).

has_min_max

Whether min and max are present (bool).

has_null_count

Whether null count is present (bool).

logical_type

Logical type of column (ParquetLogicalType).

max

Max value as logical type.

max_raw

Max value as physical type (bool, int, float, or bytes).

min

Min value as logical type.

min_raw

Min value as physical type (bool, int, float, or bytes).

null_count

Number of null values in chunk (int).

num_values

Number of non-null values (int).

physical_type

Physical type of column (str).

converted_type

Legacy converted type (str or None).

distinct_count

Distinct number of values in chunk (int).

If this is not set, will return 0.

equals(self, Statistics other)

Return whether the two column statistics objects are equal.

Parameters:
otherStatistics

Statistics to compare against.

Returns:
are_equalbool
has_distinct_count

Whether distinct count is preset (bool).

has_min_max

Whether min and max are present (bool).

has_null_count

Whether null count is present (bool).

logical_type

Logical type of column (ParquetLogicalType).

max

Max value as logical type.

Returned as the Python equivalent of logical type, such as datetime.date for dates and decimal.Decimal for decimals.

max_raw

Max value as physical type (bool, int, float, or bytes).

min

Min value as logical type.

Returned as the Python equivalent of logical type, such as datetime.date for dates and decimal.Decimal for decimals.

min_raw

Min value as physical type (bool, int, float, or bytes).

null_count

Number of null values in chunk (int).

num_values

Number of non-null values (int).

physical_type

Physical type of column (str).

to_dict(self)

Get dictionary represenation of statistics.

Returns:
dict

Dictionary with a key for each attribute of this class.