parquet::file::statistics

Enum Statistics

Source
pub enum Statistics {
    Boolean(ValueStatistics<bool>),
    Int32(ValueStatistics<i32>),
    Int64(ValueStatistics<i64>),
    Int96(ValueStatistics<Int96>),
    Float(ValueStatistics<f32>),
    Double(ValueStatistics<f64>),
    ByteArray(ValueStatistics<ByteArray>),
    FixedLenByteArray(ValueStatistics<FixedLenByteArray>),
}
Expand description

Strongly typed statistics for a column chunk within a row group.

This structure is a natively typed, in memory representation of the Statistics structure in a parquet file footer. The statistics stored in this structure can be used by query engines to skip decoding pages while reading parquet data.

Page level statistics are stored separately, in NativeIndex.

Variants§

§

Boolean(ValueStatistics<bool>)

Statistics for Boolean column

§

Int32(ValueStatistics<i32>)

Statistics for Int32 column

§

Int64(ValueStatistics<i64>)

Statistics for Int64 column

§

Int96(ValueStatistics<Int96>)

Statistics for Int96 column

§

Float(ValueStatistics<f32>)

Statistics for Float column

§

Double(ValueStatistics<f64>)

Statistics for Double column

§

ByteArray(ValueStatistics<ByteArray>)

Statistics for ByteArray column

§

FixedLenByteArray(ValueStatistics<FixedLenByteArray>)

Statistics for FixedLenByteArray column

Implementations§

Source§

impl Statistics

Source

pub fn new<T: ParquetValueType>( min: Option<T>, max: Option<T>, distinct_count: Option<u64>, null_count: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for a column type

Source

pub fn boolean( min: Option<bool>, max: Option<bool>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Boolean column type.

Source

pub fn int32( min: Option<i32>, max: Option<i32>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Int32 column type.

Source

pub fn int64( min: Option<i64>, max: Option<i64>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Int64 column type.

Source

pub fn int96( min: Option<Int96>, max: Option<Int96>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Int96 column type.

Source

pub fn float( min: Option<f32>, max: Option<f32>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Float column type.

Source

pub fn double( min: Option<f64>, max: Option<f64>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for Double column type.

Source

pub fn byte_array( min: Option<ByteArray>, max: Option<ByteArray>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for ByteArray column type.

Source

pub fn fixed_len_byte_array( min: Option<FixedLenByteArray>, max: Option<FixedLenByteArray>, distinct: Option<u64>, nulls: Option<u64>, is_deprecated: bool, ) -> Self

Creates new statistics for FixedLenByteArray column type.

Source

pub fn is_min_max_deprecated(&self) -> bool

Returns true if statistics have old min and max fields set. This means that the column order is likely to be undefined, which, for old files could mean a signed sort order of values.

Refer to ColumnOrder and SortOrder for more information.

Source

pub fn is_min_max_backwards_compatible(&self) -> bool

Old versions of parquet stored statistics in min and max fields, ordered using signed comparison. This resulted in an undefined ordering for unsigned quantities, such as booleans and unsigned integers.

These fields were therefore deprecated in favour of min_value and max_value, which have a type-defined sort order.

However, not all readers have been updated. For backwards compatibility, this method returns true if the statistics within this have a signed sort order, that is compatible with being stored in the deprecated min and max fields

Source

pub fn distinct_count(&self) -> Option<u64>

👎Deprecated since 53.0.0: Use distinct_count_opt method instead

Returns optional value of number of distinct values occurring. When it is None, the value should be ignored.

Source

pub fn distinct_count_opt(&self) -> Option<u64>

Returns optional value of number of distinct values occurring. When it is None, the value should be ignored.

Source

pub fn null_count(&self) -> u64

👎Deprecated since 53.0.0: Use null_count_opt method instead

Returns number of null values for the column. Note that this includes all nulls when column is part of the complex type.

Note this API returns 0 if the null count is not available.

Source

pub fn has_nulls(&self) -> bool

👎Deprecated since 53.0.0: Use null_count_opt method instead

Returns true if statistics collected any null values, false otherwise.

Source

pub fn null_count_opt(&self) -> Option<u64>

Returns number of null values for the column, if known. Note that this includes all nulls when column is part of the complex type.

Note this API returns Some(0) even if the null count was not present in the statistics. See https://github.com/apache/arrow-rs/pull/6216/files

Source

pub fn has_min_max_set(&self) -> bool

👎Deprecated since 53.0.0: Use min_bytes_opt and max_bytes_opt methods instead

Whether or not min and max values are set. Normally both min/max values will be set to Some(value) or None.

Source

pub fn min_is_exact(&self) -> bool

Returns true if the min value is set, and is an exact min value.

Source

pub fn max_is_exact(&self) -> bool

Returns true if the max value is set, and is an exact max value.

Source

pub fn min_bytes_opt(&self) -> Option<&[u8]>

Returns slice of bytes that represent min value, if min value is known.

Source

pub fn min_bytes(&self) -> &[u8]

👎Deprecated since 53.0.0: Use max_bytes_opt instead

Returns slice of bytes that represent min value. Panics if min value is not set.

Source

pub fn max_bytes_opt(&self) -> Option<&[u8]>

Returns slice of bytes that represent max value, if max value is known.

Source

pub fn max_bytes(&self) -> &[u8]

👎Deprecated since 53.0.0: Use max_bytes_opt instead

Returns slice of bytes that represent max value. Panics if max value is not set.

Source

pub fn physical_type(&self) -> Type

Returns physical type associated with statistics.

Trait Implementations§

Source§

impl Clone for Statistics

Source§

fn clone(&self) -> Statistics

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Statistics

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for Statistics

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<T: ParquetValueType> From<ValueStatistics<T>> for Statistics

Source§

fn from(t: ValueStatistics<T>) -> Self

Converts to this type from the input type.
Source§

impl HeapSize for Statistics

Source§

fn heap_size(&self) -> usize

Return the size of any bytes allocated on the heap by this object, including heap memory in those structures Read more
Source§

impl PartialEq for Statistics

Source§

fn eq(&self, other: &Statistics) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for Statistics

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> MaybeSendSync for T