parquet::arrow::async_writer

Struct AsyncArrowWriter

Source
pub struct AsyncArrowWriter<W> {
    sync_writer: ArrowWriter<Vec<u8>>,
    async_writer: W,
}
Expand description

Encodes [RecordBatch] to parquet, outputting to an AsyncFileWriter

§Memory Usage

This writer eagerly writes data as soon as possible to the underlying AsyncFileWriter, permitting fine-grained control over buffering and I/O scheduling. However, the columnar nature of parquet forces data for an entire row group to be buffered in memory, before it can be flushed. Depending on the data and the configured row group size, this buffering may be substantial.

Memory usage can be limited by calling Self::flush to flush the in progress row group, although this will likely increase overall file size and reduce query performance. See ArrowWriter for more information.

let mut writer: AsyncArrowWriter<File> = todo!();
let batch: RecordBatch = todo!();
writer.write(&batch).await.unwrap();
// Trigger an early flush if buffered size exceeds 1_000_000
if writer.in_progress_size() > 1_000_000 {
    writer.flush().await.unwrap()
}

Fields§

§sync_writer: ArrowWriter<Vec<u8>>

Underlying sync writer

§async_writer: W

Async writer provided by caller

Implementations§

Source§

impl<W: AsyncFileWriter> AsyncArrowWriter<W>

Source

pub fn try_new( writer: W, arrow_schema: SchemaRef, props: Option<WriterProperties>, ) -> Result<Self>

Try to create a new Async Arrow Writer

Source

pub fn try_new_with_options( writer: W, arrow_schema: SchemaRef, options: ArrowWriterOptions, ) -> Result<Self>

Try to create a new Async Arrow Writer with ArrowWriterOptions

Source

pub fn flushed_row_groups(&self) -> &[RowGroupMetaData]

Returns metadata for any flushed row groups

Source

pub fn memory_size(&self) -> usize

Estimated memory usage, in bytes, of this ArrowWriter

See ArrowWriter::memory_size for more information.

Source

pub fn in_progress_size(&self) -> usize

Anticipated encoded size of the in progress row group.

See ArrowWriter::memory_size for more information.

Source

pub fn in_progress_rows(&self) -> usize

Returns the number of rows buffered in the in progress row group

Source

pub fn bytes_written(&self) -> usize

Returns the number of bytes written by this instance

Source

pub async fn write(&mut self, batch: &RecordBatch) -> Result<()>

Enqueues the provided RecordBatch to be written

After every sync write by the inner ArrowWriter, the inner buffer will be checked and flush if at least half full

Source

pub async fn flush(&mut self) -> Result<()>

Flushes all buffered rows into a new row group

Source

pub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)

Append KeyValue metadata in addition to those in WriterProperties

This method allows to append metadata after [RecordBatch]es are written.

Source

pub async fn finish(&mut self) -> Result<FileMetaData>

Close and finalize the writer.

All the data in the inner buffer will be force flushed.

Unlike Self::close this does not consume self

Attempting to write after calling finish will result in an error

Source

pub async fn close(self) -> Result<FileMetaData>

Close and finalize the writer.

All the data in the inner buffer will be force flushed.

Source

async fn do_write(&mut self) -> Result<()>

Flush the data written by sync_writer into the async_writer

§Notes

This method will take the inner buffer from the sync_writer and write it into the async writer. After the write, the inner buffer will be empty.

Auto Trait Implementations§

§

impl<W> Freeze for AsyncArrowWriter<W>
where W: Freeze,

§

impl<W> !RefUnwindSafe for AsyncArrowWriter<W>

§

impl<W> Send for AsyncArrowWriter<W>
where W: Send,

§

impl<W> !Sync for AsyncArrowWriter<W>

§

impl<W> Unpin for AsyncArrowWriter<W>
where W: Unpin,

§

impl<W> !UnwindSafe for AsyncArrowWriter<W>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> MaybeSendSync for T