parquet::basic

Enum Encoding

Source
pub enum Encoding {
    PLAIN,
    PLAIN_DICTIONARY,
    RLE,
    BIT_PACKED,
    DELTA_BINARY_PACKED,
    DELTA_LENGTH_BYTE_ARRAY,
    DELTA_BYTE_ARRAY,
    RLE_DICTIONARY,
    BYTE_STREAM_SPLIT,
}
Expand description

Encodings supported by Parquet.

Not all encodings are valid for all types. These enums are also used to specify the encoding of definition and repetition levels.

By default this crate uses Encoding::PLAIN, Encoding::RLE, and Encoding::RLE_DICTIONARY. These provide very good encode and decode performance, whilst yielding reasonable storage efficiency and being supported by all major parquet readers.

The delta encodings are also supported and will be used if a newer WriterVersion is configured, however, it should be noted that these sacrifice encode and decode performance for improved storage efficiency. This performance regression is particularly pronounced in the case of record skipping as occurs during predicate push-down. It is recommended users assess the performance impact when evaluating these encodings.

Variants§

§

PLAIN

Default byte encoding.

  • BOOLEAN - 1 bit per value, 0 is false; 1 is true.
  • INT32 - 4 bytes per value, stored as little-endian.
  • INT64 - 8 bytes per value, stored as little-endian.
  • FLOAT - 4 bytes per value, stored as little-endian.
  • DOUBLE - 8 bytes per value, stored as little-endian.
  • BYTE_ARRAY - 4 byte length stored as little endian, followed by bytes.
  • FIXED_LEN_BYTE_ARRAY - just the bytes are stored.
§

PLAIN_DICTIONARY

Deprecated dictionary encoding.

The values in the dictionary are encoded using PLAIN encoding. Since it is deprecated, RLE_DICTIONARY encoding is used for a data page, and PLAIN encoding is used for dictionary page.

§

RLE

Group packed run length encoding.

Usable for definition/repetition levels encoding and boolean values.

§

BIT_PACKED

👎Deprecated since 51.0.0: Please see documentation for compatibility issues and use the RLE/bit-packing hybrid encoding instead

Deprecated Bit-packed encoding.

This can only be used if the data has a known max width. Usable for definition/repetition levels encoding.

There are compatibility issues with files using this encoding. The parquet standard specifies the bits to be packed starting from the most-significant bit, several implementations do not follow this bit order. Several other implementations also have issues reading this encoding because of incorrect assumptions about the length of the encoded data.

The RLE/bit-packing hybrid is more cpu and memory efficient and should be used instead.

§

DELTA_BINARY_PACKED

Delta encoding for integers, either INT32 or INT64.

Works best on sorted data.

§

DELTA_LENGTH_BYTE_ARRAY

Encoding for byte arrays to separate the length values and the data.

The lengths are encoded using DELTA_BINARY_PACKED encoding.

§

DELTA_BYTE_ARRAY

Incremental encoding for byte arrays.

Prefix lengths are encoded using DELTA_BINARY_PACKED encoding. Suffixes are stored using DELTA_LENGTH_BYTE_ARRAY encoding.

§

RLE_DICTIONARY

Dictionary encoding.

The ids are encoded using the RLE encoding.

§

BYTE_STREAM_SPLIT

Encoding for fixed-width data.

K byte-streams are created where K is the size in bytes of the data type. The individual bytes of a value are scattered to the corresponding stream and the streams are concatenated. This itself does not reduce the size of the data but can lead to better compression afterwards. Note that the use of this encoding with FIXED_LEN_BYTE_ARRAY(N) data may perform poorly for large values of N.

Trait Implementations§

Source§

impl Clone for Encoding

Source§

fn clone(&self) -> Encoding

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Encoding

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for Encoding

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<Encoding> for Encoding

Source§

fn from(value: Encoding) -> Self

Converts to this type from the input type.
Source§

impl FromStr for Encoding

Source§

type Err = ParquetError

The associated error which can be returned from parsing.
Source§

fn from_str(s: &str) -> Result<Self, Self::Err>

Parses a string s to return a value of this type. Read more
Source§

impl Hash for Encoding

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl HeapSize for Encoding

Source§

fn heap_size(&self) -> usize

Return the size of any bytes allocated on the heap by this object, including heap memory in those structures Read more
Source§

impl Ord for Encoding

Source§

fn cmp(&self, other: &Encoding) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
Source§

impl PartialEq for Encoding

Source§

fn eq(&self, other: &Encoding) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl PartialOrd for Encoding

Source§

fn partial_cmp(&self, other: &Encoding) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl TryFrom<Encoding> for Encoding

Source§

type Error = ParquetError

The type returned in the event of a conversion error.
Source§

fn try_from(value: Encoding) -> Result<Self>

Performs the conversion.
Source§

impl Copy for Encoding

Source§

impl Eq for Encoding

Source§

impl StructuralPartialEq for Encoding

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> MaybeSendSync for T