Skip to main content

ArrowWriterOptions

Struct ArrowWriterOptions 

Source
pub struct ArrowWriterOptions {
    properties: WriterProperties,
    skip_arrow_metadata: bool,
    schema_root: Option<String>,
    schema_descr: Option<SchemaDescriptor>,
    page_store_factory: Option<Arc<dyn PageStoreFactory>>,
}
Expand description

Arrow-specific configuration settings for writing parquet files.

See ArrowWriter for how to configure the writer.

Fields§

§properties: WriterProperties§skip_arrow_metadata: bool§schema_root: Option<String>§schema_descr: Option<SchemaDescriptor>§page_store_factory: Option<Arc<dyn PageStoreFactory>>

Implementations§

Source§

impl ArrowWriterOptions

Source

pub fn new() -> Self

Creates a new ArrowWriterOptions with the default settings.

Source

pub fn with_properties(self, properties: WriterProperties) -> Self

Sets the WriterProperties for writing parquet files.

Source

pub fn with_page_store_factory( self, page_store_factory: Arc<dyn PageStoreFactory>, ) -> Self

Sets the PageStoreFactory used to buffer completed pages while a row group is being written.

By default (an InMemoryPageStore per column chunk) completed pages are buffered on the heap until the row group is flushed, so peak memory grows with the row group size. Supplying a factory that spills to a temp file or object storage instead bounds peak write memory, decoupling it from the row group size while keeping large, read-optimal row groups.

§Example: a custom PageStore

A store only has to map an opaque, store-allocated PageKey to a blob and hand the blob back once. The keys need not be dense or sequential — here a HashMap-backed store mints sparse handles, proving the writer relies only on the opaque-handle contract. A real spilling backend would write the bytes to a temp file in put and read them back in take.

#[derive(Default)]
struct MapPageStore {
    blobs: HashMap<u64, Bytes>,
    next: u64,
}

impl PageStore for MapPageStore {
    fn put(&mut self, value: Bytes) -> Result<PageKey> {
        // Mint a sparse handle (every other integer) to show the writer
        // never assumes anything about the key's value.
        let key = PageKey::new(self.next);
        self.next += 2;
        self.blobs.insert(key.get(), value);
        Ok(key)
    }

    fn take(&mut self, key: PageKey) -> Result<Bytes> {
        self.blobs
            .remove(&key.get())
            .ok_or_else(|| ParquetError::General(format!("invalid key {}", key.get())))
    }
}

#[derive(Debug)]
struct MapPageStoreFactory;

impl PageStoreFactory for MapPageStoreFactory {
    fn create(&self, args: &PageStoreArgs<'_>) -> Result<Box<dyn PageStore>> {
        // `args` exposes the column index and descriptor (physical/logical
        // type, path), so a real backend could spill only large columns.
        let _ = (args.column_index(), args.column_descriptor());
        Ok(Box::new(MapPageStore::default()))
    }
}

let col = Arc::new(Int64Array::from_iter_values(0..1000)) as ArrayRef;
let to_write = RecordBatch::try_from_iter([("col", col)]).unwrap();

let options =
    ArrowWriterOptions::new().with_page_store_factory(Arc::new(MapPageStoreFactory));
let mut buffer = Vec::new();
let mut writer =
    ArrowWriter::try_new_with_options(&mut buffer, to_write.schema(), options).unwrap();
writer.write(&to_write).unwrap();
writer.close().unwrap();

// The file is byte-identical to one written with the default store.
let mut reader = ParquetRecordBatchReader::try_new(Bytes::from(buffer), 1024).unwrap();
assert_eq!(to_write, reader.next().unwrap().unwrap());
Source

pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self

Skip encoding the embedded arrow metadata (defaults to false)

Parquet files generated by the ArrowWriter contain embedded arrow schema by default.

Set skip_arrow_metadata to true, to skip encoding the embedded metadata.

Source

pub fn with_schema_root(self, schema_root: String) -> Self

Set the name of the root parquet schema element (defaults to "arrow_schema")

Source

pub fn with_parquet_schema(self, schema_descr: SchemaDescriptor) -> Self

Explicitly specify the Parquet schema to be used

If omitted (the default), the ArrowSchemaConverter is used to compute the Parquet SchemaDescriptor. This may be used When the SchemaDescriptor is already known or must be calculated using custom logic.

Trait Implementations§

Source§

impl Clone for ArrowWriterOptions

Source§

fn clone(&self) -> ArrowWriterOptions

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ArrowWriterOptions

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ArrowWriterOptions

Source§

fn default() -> ArrowWriterOptions

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Ungil for T
where T: Send,

§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V