pub(crate) struct RemainingRowGroups {
schema: SchemaRef,
frontier: RowGroupFrontier,
row_group_reader_builder: RowGroupReaderBuilder,
}Expand description
State machine that tracks the remaining high level chunks (row groups) of Parquet data left to read.
RowGroupFrontier owns cross-row-group scan state and selects the next
work item. RowGroupReaderBuilder owns decoding for the active row group.
Fields§
§schema: SchemaRefThe arrow schema of the decoded output. Carried only so
Self::into_parts can hand it to a rebuilt builder; unused while
decoding.
frontier: RowGroupFrontierCross-row-group scan state for queued work.
row_group_reader_builder: RowGroupReaderBuilderState for building the reader for the current row group
Implementations§
Source§impl RemainingRowGroups
impl RemainingRowGroups
pub fn new( schema: SchemaRef, parquet_metadata: Arc<ParquetMetaData>, row_groups: Vec<usize>, selection: Option<RowSelection>, budget: RowBudget, has_predicates: bool, row_group_reader_builder: RowGroupReaderBuilder, ) -> Self
Sourcepub(crate) fn into_parts(self) -> RemainingRowGroupsParts
pub(crate) fn into_parts(self) -> RemainingRowGroupsParts
Decompose into RemainingRowGroupsParts.
Must be called at a row-group boundary (see
Self::is_at_row_group_boundary). The inner reader builder’s runtime
decode state is discarded; its buffered bytes are carried through.
Sourcepub fn push_data(&mut self, ranges: Vec<Range<u64>>, buffers: Vec<Bytes>)
pub fn push_data(&mut self, ranges: Vec<Range<u64>>, buffers: Vec<Bytes>)
Push new data buffers that can be used to satisfy pending requests
Sourcepub fn buffered_bytes(&self) -> u64
pub fn buffered_bytes(&self) -> u64
Return the total number of bytes buffered so far
Sourcepub fn clear_all_ranges(&mut self)
pub fn clear_all_ranges(&mut self)
Clear any staged ranges currently buffered for future decode work
Sourcepub fn is_at_row_group_boundary(&self) -> bool
pub fn is_at_row_group_boundary(&self) -> bool
True iff the inner row-group reader is between row groups (state
Finished). Forward to RowGroupReaderBuilder::is_finished.
Sourcepub fn row_groups_remaining(&self) -> usize
pub fn row_groups_remaining(&self) -> usize
Number of row groups remaining (not including the one currently being decoded).
Sourcepub fn peek_next_row_group(&self) -> Result<Option<usize>, ParquetError>
pub fn peek_next_row_group(&self) -> Result<Option<usize>, ParquetError>
Peek at the file-level row-group index that the next call to
Self::try_next_reader will produce a reader for, after
simulating the same skip logic Self::try_next_reader applies
internally (row-selection emptiness + offset/limit budget). Does
not mutate state.
Returns None when the active row group is still being decoded,
when no row groups remain, or when every remaining row group
would be skipped under the current selection/budget.
Cost: one clone of the queued row-group indices and optional row-selection per call (the frontier is cloned so the real advance logic can run non-destructively). For callers that peek once per row-group boundary this is O(remaining row groups + selectors) per boundary.
Sourcepub fn try_next_reader(
&mut self,
) -> Result<DecodeResult<ParquetRecordBatchReader>, ParquetError>
pub fn try_next_reader( &mut self, ) -> Result<DecodeResult<ParquetRecordBatchReader>, ParquetError>
returns ParquetRecordBatchReader suitable for reading the next
group of rows from the Parquet data, or the list of data ranges still
needed to proceed