override_selector_strategy_if_needed

Function override_selector_strategy_if_needed 

Source
fn override_selector_strategy_if_needed(
    plan_builder: ReadPlanBuilder,
    projection_mask: &ProjectionMask,
    offset_index: Option<&[OffsetIndexMetaData]>,
) -> ReadPlanBuilder
Expand description

Override the selection strategy if needed.

Some pages can be skipped during row-group construction if they are not read by the selections. This means that the data pages for those rows are never loaded and definition/repetition levels are never read. When using RowSelections selection works because skip_records() handles this case and skips the page accordingly.

However, with the current mask design, all values must be read and decoded and then a mask filter is applied. Thus if any pages are skipped during row-group construction, the data pages are missing and cannot be decoded.

A simple example:

  • the page size is 2, the mask is 100001, row selection should be read(1) skip(4) read(1)
  • the ColumnChunkData would be page1(10), page2(skipped), page3(01)

Using the row selection to skip(4), page2 won’t be read at all, so in this case we can’t decode all the rows and apply a mask. To correctly apply the bit mask, we need all 6 values be read, but page2 is not in memory.