pub(crate) struct RowGroupReaderBuilder {
batch_size: usize,
projection: ProjectionMask,
metadata: Arc<ParquetMetaData>,
fields: Option<Arc<ParquetField>>,
filter: Option<RowFilter>,
limit: Option<usize>,
offset: Option<usize>,
max_predicate_cache_size: usize,
metrics: ArrowReaderMetrics,
state: Option<RowGroupDecoderState>,
buffers: PushBuffers,
}Expand description
Builder for ParquetRecordBatchReader for a single row group
This struct drives the main state machine for decoding each row group – it
determines what data is needed, and then assembles the
ParquetRecordBatchReader when all data is available.
Fields§
§batch_size: usizeThe output batch size
projection: ProjectionMaskWhat columns to project (produce in each output batch)
metadata: Arc<ParquetMetaData>The Parquet file metadata
fields: Option<Arc<ParquetField>>Top level parquet schema and arrow schema mapping
filter: Option<RowFilter>Optional filter
limit: Option<usize>Limit to apply to remaining row groups (decremented as rows are read)
offset: Option<usize>Offset to apply to remaining row groups (decremented as rows are read)
max_predicate_cache_size: usizeThe size in bytes of the predicate cache to use
See [RowGroupCache] for details.
metrics: ArrowReaderMetricsThe metrics collector
state: Option<RowGroupDecoderState>Current state of the decoder.
It is taken when processing, and must be put back before returning it is a bug error if it is not put back after transitioning states.
buffers: PushBuffersThe underlying data store
Implementations§
Source§impl RowGroupReaderBuilder
impl RowGroupReaderBuilder
Sourcepub(crate) fn new(
batch_size: usize,
projection: ProjectionMask,
metadata: Arc<ParquetMetaData>,
fields: Option<Arc<ParquetField>>,
filter: Option<RowFilter>,
limit: Option<usize>,
offset: Option<usize>,
metrics: ArrowReaderMetrics,
max_predicate_cache_size: usize,
buffers: PushBuffers,
) -> Self
pub(crate) fn new( batch_size: usize, projection: ProjectionMask, metadata: Arc<ParquetMetaData>, fields: Option<Arc<ParquetField>>, filter: Option<RowFilter>, limit: Option<usize>, offset: Option<usize>, metrics: ArrowReaderMetrics, max_predicate_cache_size: usize, buffers: PushBuffers, ) -> Self
Create a new RowGroupReaderBuilder
Sourcepub fn push_data(&mut self, ranges: Vec<Range<u64>>, buffers: Vec<Bytes>)
pub fn push_data(&mut self, ranges: Vec<Range<u64>>, buffers: Vec<Bytes>)
Push new data buffers that can be used to satisfy pending requests
Sourcepub fn buffered_bytes(&self) -> u64
pub fn buffered_bytes(&self) -> u64
Returns the total number of buffered bytes available
Sourcefn take_state(&mut self) -> Result<RowGroupDecoderState, ParquetError>
fn take_state(&mut self) -> Result<RowGroupDecoderState, ParquetError>
take the current state, leaving None in its place.
Returns an error if there the state wasn’t put back after the previous
call to Self::take_state.
Any code that calls this method must ensure that the state is put back before returning, otherwise the reader will error next time it is called
Sourcepub(crate) fn next_row_group(
&mut self,
row_group_idx: usize,
row_count: usize,
selection: Option<RowSelection>,
) -> Result<(), ParquetError>
pub(crate) fn next_row_group( &mut self, row_group_idx: usize, row_count: usize, selection: Option<RowSelection>, ) -> Result<(), ParquetError>
Setup this reader to read the next row group
Sourcepub(crate) fn try_build(
&mut self,
) -> Result<DecodeResult<ParquetRecordBatchReader>, ParquetError>
pub(crate) fn try_build( &mut self, ) -> Result<DecodeResult<ParquetRecordBatchReader>, ParquetError>
Try to build the next ParquetRecordBatchReader from this RowGroupReader.
If more data is needed, returns DecodeResult::NeedsData with the
ranges of data that are needed to proceed.
If a ParquetRecordBatchReader is ready, it is returned in
DecodeResult::Data.
Sourcefn try_transition(
&mut self,
current_state: RowGroupDecoderState,
) -> Result<NextState, ParquetError>
fn try_transition( &mut self, current_state: RowGroupDecoderState, ) -> Result<NextState, ParquetError>
Current state –> next state + optional output
This is the main state transition function for the row group reader and encodes the row group decoding state machine.
§Notes
This structure is used to reduce the indentation level of the main loop in try_build
Sourcefn compute_cache_projection(
&self,
row_group_idx: usize,
filter: &RowFilter,
) -> ProjectionMask
fn compute_cache_projection( &self, row_group_idx: usize, filter: &RowFilter, ) -> ProjectionMask
Which columns should be cached?
Returns the columns that are used by the filters and then used in the final projection, excluding any nested columns.
fn compute_cache_projection_inner( &self, filter: &RowFilter, ) -> Option<ProjectionMask>
Sourcefn exclude_nested_columns_from_cache(
&self,
mask: &ProjectionMask,
) -> Option<ProjectionMask>
fn exclude_nested_columns_from_cache( &self, mask: &ProjectionMask, ) -> Option<ProjectionMask>
Exclude leaves belonging to roots that span multiple parquet leaves (i.e. nested columns)
Trait Implementations§
Auto Trait Implementations§
impl Freeze for RowGroupReaderBuilder
impl !RefUnwindSafe for RowGroupReaderBuilder
impl Send for RowGroupReaderBuilder
impl !Sync for RowGroupReaderBuilder
impl Unpin for RowGroupReaderBuilder
impl !UnwindSafe for RowGroupReaderBuilder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more