RowGroupReaderBuilder

Struct RowGroupReaderBuilder 

Source
pub(crate) struct RowGroupReaderBuilder {
    batch_size: usize,
    projection: ProjectionMask,
    metadata: Arc<ParquetMetaData>,
    fields: Option<Arc<ParquetField>>,
    filter: Option<RowFilter>,
    limit: Option<usize>,
    offset: Option<usize>,
    max_predicate_cache_size: usize,
    metrics: ArrowReaderMetrics,
    state: Option<RowGroupDecoderState>,
    buffers: PushBuffers,
}
Expand description

Builder for ParquetRecordBatchReader for a single row group

This struct drives the main state machine for decoding each row group – it determines what data is needed, and then assembles the ParquetRecordBatchReader when all data is available.

Fields§

§batch_size: usize

The output batch size

§projection: ProjectionMask

What columns to project (produce in each output batch)

§metadata: Arc<ParquetMetaData>

The Parquet file metadata

§fields: Option<Arc<ParquetField>>

Top level parquet schema and arrow schema mapping

§filter: Option<RowFilter>

Optional filter

§limit: Option<usize>

Limit to apply to remaining row groups (decremented as rows are read)

§offset: Option<usize>

Offset to apply to remaining row groups (decremented as rows are read)

§max_predicate_cache_size: usize

The size in bytes of the predicate cache to use

See [RowGroupCache] for details.

§metrics: ArrowReaderMetrics

The metrics collector

§state: Option<RowGroupDecoderState>

Current state of the decoder.

It is taken when processing, and must be put back before returning it is a bug error if it is not put back after transitioning states.

§buffers: PushBuffers

The underlying data store

Implementations§

Source§

impl RowGroupReaderBuilder

Source

pub(crate) fn new( batch_size: usize, projection: ProjectionMask, metadata: Arc<ParquetMetaData>, fields: Option<Arc<ParquetField>>, filter: Option<RowFilter>, limit: Option<usize>, offset: Option<usize>, metrics: ArrowReaderMetrics, max_predicate_cache_size: usize, buffers: PushBuffers, ) -> Self

Create a new RowGroupReaderBuilder

Source

pub fn push_data(&mut self, ranges: Vec<Range<u64>>, buffers: Vec<Bytes>)

Push new data buffers that can be used to satisfy pending requests

Source

pub fn buffered_bytes(&self) -> u64

Returns the total number of buffered bytes available

Source

fn take_state(&mut self) -> Result<RowGroupDecoderState, ParquetError>

take the current state, leaving None in its place.

Returns an error if there the state wasn’t put back after the previous call to Self::take_state.

Any code that calls this method must ensure that the state is put back before returning, otherwise the reader will error next time it is called

Source

pub(crate) fn next_row_group( &mut self, row_group_idx: usize, row_count: usize, selection: Option<RowSelection>, ) -> Result<(), ParquetError>

Setup this reader to read the next row group

Source

pub(crate) fn try_build( &mut self, ) -> Result<DecodeResult<ParquetRecordBatchReader>, ParquetError>

Try to build the next ParquetRecordBatchReader from this RowGroupReader.

If more data is needed, returns DecodeResult::NeedsData with the ranges of data that are needed to proceed.

If a ParquetRecordBatchReader is ready, it is returned in DecodeResult::Data.

Source

fn try_transition( &mut self, current_state: RowGroupDecoderState, ) -> Result<NextState, ParquetError>

Current state –> next state + optional output

This is the main state transition function for the row group reader and encodes the row group decoding state machine.

§Notes

This structure is used to reduce the indentation level of the main loop in try_build

Source

fn compute_cache_projection( &self, row_group_idx: usize, filter: &RowFilter, ) -> ProjectionMask

Which columns should be cached?

Returns the columns that are used by the filters and then used in the final projection, excluding any nested columns.

Source

fn compute_cache_projection_inner( &self, filter: &RowFilter, ) -> Option<ProjectionMask>

Source

fn exclude_nested_columns_from_cache( &self, mask: &ProjectionMask, ) -> Option<ProjectionMask>

Exclude leaves belonging to roots that span multiple parquet leaves (i.e. nested columns)

Trait Implementations§

Source§

impl Debug for RowGroupReaderBuilder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> Ungil for T
where T: Send,