pub struct RowFilter {
pub(crate) predicates: Vec<Box<dyn ArrowPredicate>>,
}
Expand description
Filter applied during the parquet read process
RowFilter
applies predicates in order, after decoding only the columns
required. As predicates eliminate rows, fewer rows from subsequent columns
may be required, thus potentially reducing IO and decode.
A RowFilter
consists of a list of ArrowPredicate
s. Only the rows for which
all the predicates evaluate to true
will be returned.
Any RowSelection
provided to the reader will be applied prior
to the first predicate, and each predicate in turn will then be used to compute
a more refined RowSelection
used when evaluating the subsequent predicates.
Once all predicates have been evaluated, the final RowSelection
is applied
to the top-level ProjectionMask
to produce the final output [RecordBatch
].
This design has a couple of implications:
RowFilter
can be used to skip entire pages, and thus IO, in addition to CPU decode overheads- Columns may be decoded multiple times if they appear in multiple
ProjectionMask
- IO will be deferred until needed by a
ProjectionMask
As such there is a trade-off between a single large predicate, or multiple predicates, that will depend on the shape of the data. Whilst multiple smaller predicates may minimise the amount of data scanned/decoded, it may not be faster overall.
For example, if a predicate that needs a single column of data filters out all but
1% of the rows, applying it as one of the early ArrowPredicateFn
will likely significantly
improve performance.
As a counter example, if a predicate needs several columns of data to evaluate but leaves 99% of the rows, it may be better to not filter the data from parquet and apply the filter after the RecordBatch has been fully decoded.
Additionally, even if a predicate eliminates a moderate number of rows, it may still be faster to filter the data after the RecordBatch has been fully decoded, if the eliminated rows are not contiguous.
Fields§
§predicates: Vec<Box<dyn ArrowPredicate>>
A list of ArrowPredicate
Implementations§
Source§impl RowFilter
impl RowFilter
Sourcepub fn new(predicates: Vec<Box<dyn ArrowPredicate>>) -> Self
pub fn new(predicates: Vec<Box<dyn ArrowPredicate>>) -> Self
Create a new RowFilter
from an array of ArrowPredicate
Auto Trait Implementations§
impl Freeze for RowFilter
impl !RefUnwindSafe for RowFilter
impl Send for RowFilter
impl !Sync for RowFilter
impl Unpin for RowFilter
impl !UnwindSafe for RowFilter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more