pub struct GenericColumnWriter<'a, E: ColumnValueEncoder> {Show 18 fields
descr: ColumnDescPtr,
props: WriterPropertiesPtr,
statistics_enabled: EnabledStatistics,
page_writer: Box<dyn PageWriter + 'a>,
codec: Compression,
compressor: Option<Box<dyn Codec>>,
encoder: E,
page_metrics: PageMetrics,
column_metrics: ColumnMetrics<E::T>,
encodings: BTreeSet<Encoding>,
def_levels_sink: Vec<i16>,
rep_levels_sink: Vec<i16>,
data_pages: VecDeque<CompressedPage>,
column_index_builder: ColumnIndexBuilder,
offset_index_builder: Option<OffsetIndexBuilder>,
data_page_boundary_ascending: bool,
data_page_boundary_descending: bool,
last_non_null_data_page_min_max: Option<(E::T, E::T)>,
}
Expand description
Generic column writer for a primitive column.
Fields§
§descr: ColumnDescPtr
§props: WriterPropertiesPtr
§statistics_enabled: EnabledStatistics
§page_writer: Box<dyn PageWriter + 'a>
§codec: Compression
§compressor: Option<Box<dyn Codec>>
§encoder: E
§page_metrics: PageMetrics
§column_metrics: ColumnMetrics<E::T>
§encodings: BTreeSet<Encoding>
The order of encodings within the generated metadata does not impact its meaning, but we use a BTreeSet so that the output is deterministic
def_levels_sink: Vec<i16>
§rep_levels_sink: Vec<i16>
§data_pages: VecDeque<CompressedPage>
§column_index_builder: ColumnIndexBuilder
§offset_index_builder: Option<OffsetIndexBuilder>
§data_page_boundary_ascending: bool
§data_page_boundary_descending: bool
§last_non_null_data_page_min_max: Option<(E::T, E::T)>
(min, max)
Implementations§
Source§impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E>
impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E>
Sourcepub fn new(
descr: ColumnDescPtr,
props: WriterPropertiesPtr,
page_writer: Box<dyn PageWriter + 'a>,
) -> Self
pub fn new( descr: ColumnDescPtr, props: WriterPropertiesPtr, page_writer: Box<dyn PageWriter + 'a>, ) -> Self
Returns a new instance of GenericColumnWriter
.
pub(crate) fn write_batch_internal( &mut self, values: &E::Values, value_indices: Option<&[usize]>, def_levels: Option<&[i16]>, rep_levels: Option<&[i16]>, min: Option<&E::T>, max: Option<&E::T>, distinct_count: Option<u64>, ) -> Result<usize>
Sourcepub fn write_batch(
&mut self,
values: &E::Values,
def_levels: Option<&[i16]>,
rep_levels: Option<&[i16]>,
) -> Result<usize>
pub fn write_batch( &mut self, values: &E::Values, def_levels: Option<&[i16]>, rep_levels: Option<&[i16]>, ) -> Result<usize>
Writes batch of values, definition levels and repetition levels. Returns number of values processed (written).
If definition and repetition levels are provided, we write fully those levels and select how many values to write (this number will be returned), since number of actual written values may be smaller than provided values.
If only values are provided, then all values are written and the length of of the values buffer is returned.
Definition and/or repetition levels can be omitted, if values are non-nullable and/or non-repeated.
Sourcepub fn write_batch_with_statistics(
&mut self,
values: &E::Values,
def_levels: Option<&[i16]>,
rep_levels: Option<&[i16]>,
min: Option<&E::T>,
max: Option<&E::T>,
distinct_count: Option<u64>,
) -> Result<usize>
pub fn write_batch_with_statistics( &mut self, values: &E::Values, def_levels: Option<&[i16]>, rep_levels: Option<&[i16]>, min: Option<&E::T>, max: Option<&E::T>, distinct_count: Option<u64>, ) -> Result<usize>
Writer may optionally provide pre-calculated statistics for use when computing chunk-level statistics
NB: WriterProperties::statistics_enabled
must be set to EnabledStatistics::Chunk
for these statistics to take effect. If EnabledStatistics::None
they will be ignored,
and if EnabledStatistics::Page
the chunk statistics will instead be computed from the
computed page statistics
Sourcepub(crate) fn memory_size(&self) -> usize
pub(crate) fn memory_size(&self) -> usize
Returns the estimated total memory usage.
Unlike Self::get_estimated_total_bytes
this is an estimate
of the current memory usage and not the final anticipated encoded size.
Sourcepub fn get_total_bytes_written(&self) -> u64
pub fn get_total_bytes_written(&self) -> u64
Returns total number of bytes written by this column writer so far. This value is also returned when column writer is closed.
Note: this value does not include any buffered data that has not yet been flushed to a page.
Sourcepub(crate) fn get_estimated_total_bytes(&self) -> u64
pub(crate) fn get_estimated_total_bytes(&self) -> u64
Returns the estimated total encoded bytes for this column writer.
Unlike Self::get_total_bytes_written
this includes an estimate
of any data that has not yet been flushed to a page, based on it’s
anticipated encoded size.
Sourcepub fn get_total_rows_written(&self) -> u64
pub fn get_total_rows_written(&self) -> u64
Returns total number of rows written by this column writer so far. This value is also returned when column writer is closed.
Sourcepub fn get_descriptor(&self) -> &ColumnDescPtr
pub fn get_descriptor(&self) -> &ColumnDescPtr
Returns a reference to a ColumnDescPtr
Sourcepub fn close(self) -> Result<ColumnCloseResult>
pub fn close(self) -> Result<ColumnCloseResult>
Finalizes writes and closes the column writer. Returns total bytes written, total rows written and column chunk metadata.
Sourcefn write_mini_batch(
&mut self,
values: &E::Values,
values_offset: usize,
value_indices: Option<&[usize]>,
num_levels: usize,
def_levels: Option<&[i16]>,
rep_levels: Option<&[i16]>,
) -> Result<usize>
fn write_mini_batch( &mut self, values: &E::Values, values_offset: usize, value_indices: Option<&[usize]>, num_levels: usize, def_levels: Option<&[i16]>, rep_levels: Option<&[i16]>, ) -> Result<usize>
Writes mini batch of values, definition and repetition levels. This allows fine-grained processing of values and maintaining a reasonable page size.
Sourcefn should_dict_fallback(&self) -> bool
fn should_dict_fallback(&self) -> bool
Returns true if we need to fall back to non-dictionary encoding.
We can only fall back if dictionary encoder is set and we have exceeded dictionary size.
Sourcefn should_add_data_page(&self) -> bool
fn should_add_data_page(&self) -> bool
Returns true if there is enough data for a data page, false otherwise.
Sourcefn dict_fallback(&mut self) -> Result<()>
fn dict_fallback(&mut self) -> Result<()>
Performs dictionary fallback. Prepares and writes dictionary and all data pages into page writer.
Sourcefn update_column_offset_index(
&mut self,
page_statistics: Option<&ValueStatistics<E::T>>,
page_variable_length_bytes: Option<i64>,
)
fn update_column_offset_index( &mut self, page_statistics: Option<&ValueStatistics<E::T>>, page_variable_length_bytes: Option<i64>, )
Update the column index and offset index when adding the data page
Sourcefn can_truncate_value(&self) -> bool
fn can_truncate_value(&self) -> bool
Determine if we should allow truncating min/max values for this column’s statistics
Sourcefn truncate_min_value(
&self,
truncation_length: Option<usize>,
data: &[u8],
) -> (Vec<u8>, bool)
fn truncate_min_value( &self, truncation_length: Option<usize>, data: &[u8], ) -> (Vec<u8>, bool)
Truncates a binary statistic to at most truncation_length
bytes.
If truncation is not possible, returns data
.
The bool
in the returned tuple indicates whether truncation occurred or not.
UTF-8 Note:
If the column type indicates UTF-8, and data
contains valid UTF-8, then the result will
also remain valid UTF-8, but may be less tnan truncation_length
bytes to avoid splitting
on non-character boundaries.
Sourcefn truncate_max_value(
&self,
truncation_length: Option<usize>,
data: &[u8],
) -> (Vec<u8>, bool)
fn truncate_max_value( &self, truncation_length: Option<usize>, data: &[u8], ) -> (Vec<u8>, bool)
Truncates a binary statistic to at most truncation_length
bytes, and then increment the
final byte(s) to yield a valid upper bound. This may result in a result of less than
truncation_length
bytes if the last byte(s) overflows.
If truncation is not possible, returns data
.
The bool
in the returned tuple indicates whether truncation occurred or not.
UTF-8 Note:
If the column type indicates UTF-8, and data
contains valid UTF-8, then the result will
also remain valid UTF-8 (but again may be less than truncation_length
bytes). If data
does not contain valid UTF-8, then truncation will occur as if the column is non-string
binary.
Sourcefn add_data_page(&mut self) -> Result<()>
fn add_data_page(&mut self) -> Result<()>
Adds data page. Data page is either buffered in case of dictionary encoding or written directly.
Sourcefn flush_data_pages(&mut self) -> Result<()>
fn flush_data_pages(&mut self) -> Result<()>
Finalises any outstanding data pages and flushes buffered data pages from dictionary encoding into underlying sink.
Sourcefn build_column_metadata(&mut self) -> Result<ColumnChunkMetaData>
fn build_column_metadata(&mut self) -> Result<ColumnChunkMetaData>
Assembles column chunk metadata.
Sourcefn encode_levels_v1(
&self,
encoding: Encoding,
levels: &[i16],
max_level: i16,
) -> Vec<u8> ⓘ
fn encode_levels_v1( &self, encoding: Encoding, levels: &[i16], max_level: i16, ) -> Vec<u8> ⓘ
Encodes definition or repetition levels for Data Page v1.
Sourcefn encode_levels_v2(&self, levels: &[i16], max_level: i16) -> Vec<u8> ⓘ
fn encode_levels_v2(&self, levels: &[i16], max_level: i16) -> Vec<u8> ⓘ
Encodes definition or repetition levels for Data Page v2. Encoding is always RLE.
Sourcefn write_data_page(&mut self, page: CompressedPage) -> Result<()>
fn write_data_page(&mut self, page: CompressedPage) -> Result<()>
Writes compressed data page into underlying sink and updates global metrics.
Sourcefn write_dictionary_page(&mut self) -> Result<()>
fn write_dictionary_page(&mut self) -> Result<()>
Writes dictionary page into underlying sink.
Sourcefn update_metrics_for_page(&mut self, page_spec: PageWriteSpec)
fn update_metrics_for_page(&mut self, page_spec: PageWriteSpec)
Updates column writer metrics with each page metadata.
Auto Trait Implementations§
impl<'a, E> Freeze for GenericColumnWriter<'a, E>
impl<'a, E> !RefUnwindSafe for GenericColumnWriter<'a, E>
impl<'a, E> Send for GenericColumnWriter<'a, E>where
E: Send,
impl<'a, E> !Sync for GenericColumnWriter<'a, E>
impl<'a, E> Unpin for GenericColumnWriter<'a, E>
impl<'a, E> !UnwindSafe for GenericColumnWriter<'a, E>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more