pub struct SerializedFileWriter<W: Write> {
buf: TrackedWrite<W>,
schema: TypePtr,
descr: SchemaDescPtr,
props: WriterPropertiesPtr,
row_groups: Vec<RowGroupMetaData>,
bloom_filters: Vec<Vec<Option<Sbbf>>>,
column_indexes: Vec<Vec<Option<ColumnIndex>>>,
offset_indexes: Vec<Vec<Option<OffsetIndex>>>,
row_group_index: usize,
kv_metadatas: Vec<KeyValue>,
finished: bool,
file_encryptor: Option<Arc<FileEncryptor>>,
}
Expand description
Parquet file writer API. Provides methods to write row groups sequentially.
The main workflow should be as following:
- Create file writer, this will open a new file and potentially write some metadata.
- Request a new row group writer by calling
next_row_group
. - Once finished writing row group, close row group writer by calling
close
- Write subsequent row groups, if necessary.
- After all row groups have been written, close the file writer using
close
method.
Fields§
§buf: TrackedWrite<W>
§schema: TypePtr
§descr: SchemaDescPtr
§props: WriterPropertiesPtr
§row_groups: Vec<RowGroupMetaData>
§bloom_filters: Vec<Vec<Option<Sbbf>>>
§column_indexes: Vec<Vec<Option<ColumnIndex>>>
§offset_indexes: Vec<Vec<Option<OffsetIndex>>>
§row_group_index: usize
§kv_metadatas: Vec<KeyValue>
§finished: bool
§file_encryptor: Option<Arc<FileEncryptor>>
Implementations§
Source§impl<W: Write + Send> SerializedFileWriter<W>
impl<W: Write + Send> SerializedFileWriter<W>
Sourcepub fn new(
buf: W,
schema: TypePtr,
properties: WriterPropertiesPtr,
) -> Result<Self>
pub fn new( buf: W, schema: TypePtr, properties: WriterPropertiesPtr, ) -> Result<Self>
Creates new file writer.
fn get_file_encryptor( properties: &WriterPropertiesPtr, schema_descriptor: &SchemaDescriptor, ) -> Result<Option<Arc<FileEncryptor>>>
Sourcepub fn next_row_group(&mut self) -> Result<SerializedRowGroupWriter<'_, W>>
pub fn next_row_group(&mut self) -> Result<SerializedRowGroupWriter<'_, W>>
Creates new row group from this file writer.
In case of IO error or Thrift error, returns Err
.
There can be at most 2^15 row groups in a file; and row groups have
to be written sequentially. Every time the next row group is requested, the
previous row group must be finalised and closed using RowGroupWriter::close
method.
Sourcepub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
pub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
Returns metadata for any flushed row groups
Sourcepub fn finish(&mut self) -> Result<FileMetaData>
pub fn finish(&mut self) -> Result<FileMetaData>
Close and finalize the underlying Parquet writer
Unlike Self::close
this does not consume self
Attempting to write after calling finish will result in an error
Sourcepub fn close(self) -> Result<FileMetaData>
pub fn close(self) -> Result<FileMetaData>
Closes and finalises file writer, returning the file metadata.
Sourcefn start_file(
properties: &WriterPropertiesPtr,
buf: &mut TrackedWrite<W>,
) -> Result<()>
fn start_file( properties: &WriterPropertiesPtr, buf: &mut TrackedWrite<W>, ) -> Result<()>
Writes magic bytes at the beginning of the file.
Sourcefn write_metadata(&mut self) -> Result<FileMetaData>
fn write_metadata(&mut self) -> Result<FileMetaData>
Assembles and writes metadata at the end of the file.
fn assert_previous_writer_closed(&self) -> Result<()>
Sourcepub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
pub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
Add a KeyValue
to the file writer’s metadata
Sourcepub fn schema_descr(&self) -> &SchemaDescriptor
pub fn schema_descr(&self) -> &SchemaDescriptor
Returns a reference to schema descriptor.
Sourcepub fn properties(&self) -> &WriterPropertiesPtr
pub fn properties(&self) -> &WriterPropertiesPtr
Returns a reference to the writer properties
Sourcepub fn write_all(&mut self, buf: &[u8]) -> Result<()>
pub fn write_all(&mut self, buf: &[u8]) -> Result<()>
Writes the given buf bytes to the internal buffer.
This can be used to write raw data to an in-progress parquet file, for example, custom index structures or other payloads. Other parquet readers will skip this data when reading the files.
It’s safe to use this method to write data to the underlying writer, because it will ensure that the buffering and byte‐counting layers are used.
Sourcepub fn inner_mut(&mut self) -> &mut W
pub fn inner_mut(&mut self) -> &mut W
Returns a mutable reference to the underlying writer.
Warning: if you write directly to this writer, you will skip
the TrackedWrite
buffering and byte‐counting layers. That’ll cause
the file footer’s recorded offsets and sizes to diverge from reality,
resulting in an unreadable or corrupted Parquet file.
If you want to write safely to the underlying writer, use Self::write_all
.
Sourcepub fn into_inner(self) -> Result<W>
pub fn into_inner(self) -> Result<W>
Writes the file footer and returns the underlying writer.
Sourcepub fn bytes_written(&self) -> usize
pub fn bytes_written(&self) -> usize
Returns the number of bytes written to this instance
Sourcepub(crate) fn file_encryptor(&self) -> Option<Arc<FileEncryptor>>
pub(crate) fn file_encryptor(&self) -> Option<Arc<FileEncryptor>>
Get the file encryptor used by this instance to encrypt data
Trait Implementations§
Auto Trait Implementations§
impl<W> Freeze for SerializedFileWriter<W>where
W: Freeze,
impl<W> RefUnwindSafe for SerializedFileWriter<W>where
W: RefUnwindSafe,
impl<W> Send for SerializedFileWriter<W>where
W: Send,
impl<W> Sync for SerializedFileWriter<W>where
W: Sync,
impl<W> Unpin for SerializedFileWriter<W>where
W: Unpin,
impl<W> UnwindSafe for SerializedFileWriter<W>where
W: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more