ParquetPushDecoder

Struct ParquetPushDecoder 

Source
pub struct ParquetPushDecoder {
    state: ParquetDecoderState,
}
Expand description

A push based Parquet Decoder

See ParquetPushDecoderBuilder for an example of how to build and use the decoder.

ParquetPushDecoder is a low level API for decoding Parquet data without an underlying reader for performing IO, and thus offers fine grained control over how data is fetched and decoded.

When more data is needed to make progress, instead of reading data directly from a reader, the decoder returns DecodeResult indicating what ranges are needed. Once the caller provides the requested ranges via Self::push_ranges, they try to decode again by calling Self::try_decode.

The decoder’s internal state tracks what has been already decoded and what is needed next.

Fields§

§state: ParquetDecoderState

The inner state.

This state is consumed on every transition and a new state is produced so the Rust compiler can ensure that the state is always valid and transitions are not missed.

Implementations§

Source§

impl ParquetPushDecoder

Source

pub fn try_decode(&mut self) -> Result<DecodeResult<RecordBatch>, ParquetError>

Attempt to decode the next batch of data, or return what data is needed

The the decoder communicates the next state with a DecodeResult

See full example in ParquetPushDecoderBuilder

use parquet::DecodeResult;
let mut decoder = get_decoder();
loop {
   match decoder.try_decode().unwrap() {
      DecodeResult::NeedsData(ranges) => {
        // The decoder needs more data. Fetch the data for the given ranges
        // call decoder.push_ranges(ranges, data) and call again
        push_data(&mut decoder, ranges);
      }
      DecodeResult::Data(batch) => {
        // Successfully decoded the next batch of data
        println!("Got batch with {} rows", batch.num_rows());
      }
      DecodeResult::Finished => {
        // The decoder has finished decoding all data
        break;
      }
   }
}
Source

pub fn push_range( &mut self, range: Range<u64>, data: Bytes, ) -> Result<(), ParquetError>

Push data into the decoder for processing

This is a convenience wrapper around Self::push_ranges for pushing a single range of data.

Note this can be the entire file or just a part of it. If it is part of the file, the ranges should correspond to the data ranges requested by the decoder.

See example in ParquetPushDecoderBuilder

Source

pub fn push_ranges( &mut self, ranges: Vec<Range<u64>>, data: Vec<Bytes>, ) -> Result<(), ParquetError>

Push data into the decoder for processing

This should correspond to the data ranges requested by the decoder

Source

pub fn buffered_bytes(&self) -> u64

Returns the total number of buffered bytes in the decoder

This is the sum of the size of all [Bytes] that has been pushed to the decoder but not yet consumed.

Note that this does not include any overhead of the internal data structures and that since [Bytes] are ref counted memory, this may not reflect additional memory usage.

This can be used to monitor memory usage of the decoder.

Trait Implementations§

Source§

impl Debug for ParquetPushDecoder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> Ungil for T
where T: Send,