pub struct WriterProperties {Show 16 fields
data_page_size_limit: usize,
dictionary_page_size_limit: usize,
data_page_row_count_limit: usize,
write_batch_size: usize,
max_row_group_size: usize,
bloom_filter_position: BloomFilterPosition,
writer_version: WriterVersion,
created_by: String,
offset_index_disabled: bool,
pub(crate) key_value_metadata: Option<Vec<KeyValue>>,
default_column_properties: ColumnProperties,
column_properties: HashMap<ColumnPath, ColumnProperties>,
sorting_columns: Option<Vec<SortingColumn>>,
column_index_truncate_length: Option<usize>,
statistics_truncate_length: Option<usize>,
coerce_types: bool,
}
Expand description
Configuration settings for writing parquet files.
Use Self::builder
to create a WriterPropertiesBuilder
to change settings.
§Example
// Create properties with default configuration.
let props = WriterProperties::default();
// Use properties builder to set certain options and assemble the configuration.
let props = WriterProperties::builder()
.set_writer_version(WriterVersion::PARQUET_1_0)
.set_encoding(Encoding::PLAIN)
.set_column_encoding(ColumnPath::from("col1"), Encoding::DELTA_BINARY_PACKED)
.set_compression(Compression::SNAPPY)
.build();
assert_eq!(props.writer_version(), WriterVersion::PARQUET_1_0);
assert_eq!(
props.encoding(&ColumnPath::from("col1")),
Some(Encoding::DELTA_BINARY_PACKED)
);
assert_eq!(
props.encoding(&ColumnPath::from("col2")),
Some(Encoding::PLAIN)
);
Fields§
§data_page_size_limit: usize
§dictionary_page_size_limit: usize
§data_page_row_count_limit: usize
§write_batch_size: usize
§max_row_group_size: usize
§bloom_filter_position: BloomFilterPosition
§writer_version: WriterVersion
§created_by: String
§offset_index_disabled: bool
§key_value_metadata: Option<Vec<KeyValue>>
§default_column_properties: ColumnProperties
§column_properties: HashMap<ColumnPath, ColumnProperties>
§sorting_columns: Option<Vec<SortingColumn>>
§column_index_truncate_length: Option<usize>
§statistics_truncate_length: Option<usize>
§coerce_types: bool
Implementations§
Source§impl WriterProperties
impl WriterProperties
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new WriterProperties
with the default settings
See WriterProperties::builder
for customising settings
Sourcepub fn builder() -> WriterPropertiesBuilder
pub fn builder() -> WriterPropertiesBuilder
Returns a new default WriterPropertiesBuilder
for creating writer
properties.
Sourcepub fn data_page_size_limit(&self) -> usize
pub fn data_page_size_limit(&self) -> usize
Returns data page size limit.
Note: this is a best effort limit based on the write batch size
For more details see WriterPropertiesBuilder::set_data_page_size_limit
Sourcepub fn dictionary_page_size_limit(&self) -> usize
pub fn dictionary_page_size_limit(&self) -> usize
Returns dictionary page size limit.
Note: this is a best effort limit based on the write batch size
For more details see WriterPropertiesBuilder::set_dictionary_page_size_limit
Sourcepub fn data_page_row_count_limit(&self) -> usize
pub fn data_page_row_count_limit(&self) -> usize
Returns the maximum page row count
Note: this is a best effort limit based on the write batch size
For more details see WriterPropertiesBuilder::set_data_page_row_count_limit
Sourcepub fn write_batch_size(&self) -> usize
pub fn write_batch_size(&self) -> usize
Returns configured batch size for writes.
When writing a batch of data, this setting allows to split it internally into smaller batches so we can better estimate the size of a page currently being written.
Sourcepub fn max_row_group_size(&self) -> usize
pub fn max_row_group_size(&self) -> usize
Returns maximum number of rows in a row group.
Sourcepub fn bloom_filter_position(&self) -> BloomFilterPosition
pub fn bloom_filter_position(&self) -> BloomFilterPosition
Returns maximum number of rows in a row group.
Sourcepub fn writer_version(&self) -> WriterVersion
pub fn writer_version(&self) -> WriterVersion
Returns configured writer version.
Sourcepub fn created_by(&self) -> &str
pub fn created_by(&self) -> &str
Returns created_by
string.
Sourcepub fn offset_index_disabled(&self) -> bool
pub fn offset_index_disabled(&self) -> bool
Returns true
if offset index writing is disabled.
Sourcepub fn key_value_metadata(&self) -> Option<&Vec<KeyValue>>
pub fn key_value_metadata(&self) -> Option<&Vec<KeyValue>>
Returns key_value_metadata
KeyValue pairs.
Sourcepub fn sorting_columns(&self) -> Option<&Vec<SortingColumn>>
pub fn sorting_columns(&self) -> Option<&Vec<SortingColumn>>
Returns sorting columns.
Sourcepub fn column_index_truncate_length(&self) -> Option<usize>
pub fn column_index_truncate_length(&self) -> Option<usize>
Returns the maximum length of truncated min/max values in the column index.
None
if truncation is disabled, must be greater than 0 otherwise.
Sourcepub fn statistics_truncate_length(&self) -> Option<usize>
pub fn statistics_truncate_length(&self) -> Option<usize>
Returns the maximum length of truncated min/max values in statistics.
None
if truncation is disabled, must be greater than 0 otherwise.
Sourcepub fn coerce_types(&self) -> bool
pub fn coerce_types(&self) -> bool
Returns coerce_types
boolean
Some Arrow types do not have a corresponding Parquet logical type.
Affected Arrow data types include Date64
, Timestamp
and Interval
.
Writers have the option to coerce these into native Parquet types. Type
coercion allows for meaningful representations that do not require
downstream readers to consider the embedded Arrow schema. However, type
coercion also prevents the data from being losslessly round-tripped. This method
returns true
if type coercion enabled.
Sourcepub fn dictionary_data_page_encoding(&self) -> Encoding
pub fn dictionary_data_page_encoding(&self) -> Encoding
Returns encoding for a data page, when dictionary encoding is enabled. This is not configurable.
Sourcepub fn dictionary_page_encoding(&self) -> Encoding
pub fn dictionary_page_encoding(&self) -> Encoding
Returns encoding for dictionary page, when dictionary encoding is enabled. This is not configurable.
Sourcepub fn encoding(&self, col: &ColumnPath) -> Option<Encoding>
pub fn encoding(&self, col: &ColumnPath) -> Option<Encoding>
Returns encoding for a column, if set. In case when dictionary is enabled, returns fallback encoding.
If encoding is not set, then column writer will choose the best encoding based on the column type.
Sourcepub fn compression(&self, col: &ColumnPath) -> Compression
pub fn compression(&self, col: &ColumnPath) -> Compression
Returns compression codec for a column.
Sourcepub fn dictionary_enabled(&self, col: &ColumnPath) -> bool
pub fn dictionary_enabled(&self, col: &ColumnPath) -> bool
Returns true
if dictionary encoding is enabled for a column.
Sourcepub fn statistics_enabled(&self, col: &ColumnPath) -> EnabledStatistics
pub fn statistics_enabled(&self, col: &ColumnPath) -> EnabledStatistics
Returns which statistics are written for a column.
Sourcepub fn max_statistics_size(&self, col: &ColumnPath) -> usize
pub fn max_statistics_size(&self, col: &ColumnPath) -> usize
Returns max size for statistics. Only applicable if statistics are enabled.
Sourcepub fn bloom_filter_properties(
&self,
col: &ColumnPath,
) -> Option<&BloomFilterProperties>
pub fn bloom_filter_properties( &self, col: &ColumnPath, ) -> Option<&BloomFilterProperties>
Returns the BloomFilterProperties
for the given column
Returns None
if bloom filter is disabled
Trait Implementations§
Source§impl Clone for WriterProperties
impl Clone for WriterProperties
Source§fn clone(&self) -> WriterProperties
fn clone(&self) -> WriterProperties
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl Debug for WriterProperties
impl Debug for WriterProperties
Auto Trait Implementations§
impl Freeze for WriterProperties
impl RefUnwindSafe for WriterProperties
impl Send for WriterProperties
impl Sync for WriterProperties
impl Unpin for WriterProperties
impl UnwindSafe for WriterProperties
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more