pub struct AsyncArrowWriter<W> {
sync_writer: ArrowWriter<Vec<u8>>,
async_writer: W,
}
Expand description
Encodes [RecordBatch
] to parquet, outputting to an AsyncFileWriter
§Memory Usage
This writer eagerly writes data as soon as possible to the underlying AsyncFileWriter
,
permitting fine-grained control over buffering and I/O scheduling. However, the columnar
nature of parquet forces data for an entire row group to be buffered in memory, before
it can be flushed. Depending on the data and the configured row group size, this buffering
may be substantial.
Memory usage can be limited by calling Self::flush
to flush the in progress row group,
although this will likely increase overall file size and reduce query performance.
See ArrowWriter for more information.
let mut writer: AsyncArrowWriter<File> = todo!();
let batch: RecordBatch = todo!();
writer.write(&batch).await.unwrap();
// Trigger an early flush if buffered size exceeds 1_000_000
if writer.in_progress_size() > 1_000_000 {
writer.flush().await.unwrap()
}
Fields§
§sync_writer: ArrowWriter<Vec<u8>>
Underlying sync writer
async_writer: W
Async writer provided by caller
Implementations§
Source§impl<W: AsyncFileWriter> AsyncArrowWriter<W>
impl<W: AsyncFileWriter> AsyncArrowWriter<W>
Sourcepub fn try_new(
writer: W,
arrow_schema: SchemaRef,
props: Option<WriterProperties>,
) -> Result<Self>
pub fn try_new( writer: W, arrow_schema: SchemaRef, props: Option<WriterProperties>, ) -> Result<Self>
Try to create a new Async Arrow Writer
Sourcepub fn try_new_with_options(
writer: W,
arrow_schema: SchemaRef,
options: ArrowWriterOptions,
) -> Result<Self>
pub fn try_new_with_options( writer: W, arrow_schema: SchemaRef, options: ArrowWriterOptions, ) -> Result<Self>
Try to create a new Async Arrow Writer with ArrowWriterOptions
Sourcepub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
pub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
Returns metadata for any flushed row groups
Sourcepub fn memory_size(&self) -> usize
pub fn memory_size(&self) -> usize
Estimated memory usage, in bytes, of this ArrowWriter
See ArrowWriter::memory_size for more information.
Sourcepub fn in_progress_size(&self) -> usize
pub fn in_progress_size(&self) -> usize
Anticipated encoded size of the in progress row group.
See ArrowWriter::memory_size for more information.
Sourcepub fn in_progress_rows(&self) -> usize
pub fn in_progress_rows(&self) -> usize
Returns the number of rows buffered in the in progress row group
Sourcepub fn bytes_written(&self) -> usize
pub fn bytes_written(&self) -> usize
Returns the number of bytes written by this instance
Sourcepub async fn write(&mut self, batch: &RecordBatch) -> Result<()>
pub async fn write(&mut self, batch: &RecordBatch) -> Result<()>
Enqueues the provided RecordBatch
to be written
After every sync write by the inner ArrowWriter, the inner buffer will be checked and flush if at least half full
Sourcepub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
pub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
Append KeyValue
metadata in addition to those in WriterProperties
This method allows to append metadata after [RecordBatch
]es are written.
Sourcepub async fn finish(&mut self) -> Result<FileMetaData>
pub async fn finish(&mut self) -> Result<FileMetaData>
Close and finalize the writer.
All the data in the inner buffer will be force flushed.
Unlike Self::close
this does not consume self
Attempting to write after calling finish will result in an error
Sourcepub async fn close(self) -> Result<FileMetaData>
pub async fn close(self) -> Result<FileMetaData>
Close and finalize the writer.
All the data in the inner buffer will be force flushed.
Auto Trait Implementations§
impl<W> Freeze for AsyncArrowWriter<W>where
W: Freeze,
impl<W> !RefUnwindSafe for AsyncArrowWriter<W>
impl<W> Send for AsyncArrowWriter<W>where
W: Send,
impl<W> !Sync for AsyncArrowWriter<W>
impl<W> Unpin for AsyncArrowWriter<W>where
W: Unpin,
impl<W> !UnwindSafe for AsyncArrowWriter<W>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more