parquet::file::metadata::writer

Struct ParquetMetaDataWriter

Source
pub struct ParquetMetaDataWriter<'a, W: Write> {
    buf: TrackedWrite<W>,
    metadata: &'a ParquetMetaData,
}
Expand description

Writes ParquetMetaData to a byte stream

This structure handles the details of writing the various parts of Parquet metadata into a byte stream. It is used to write the metadata into a parquet file and can also write metadata into other locations (such as a store of bytes).

§Discussion

The process of writing Parquet metadata is tricky because the metadata is not stored as a single inline thrift structure. It can have several “out of band” structures such as the OffsetIndex and BloomFilters stored in separate structures whose locations are stored as offsets from the beginning of the file.

Note: this writer does not directly write BloomFilters. In order to write BloomFilters, write the bloom filters into the buffer before creating the metadata writer. Then set the corresponding bloom_filter_offset and bloom_filter_length on ColumnChunkMetaData passed to this writer.

§Output Format

The format of the metadata is as follows:

  1. Optional ColumnIndex (thrift encoded)
  2. Optional OffsetIndex (thrift encoded)
  3. FileMetaData (thrift encoded)
  4. Length of encoded FileMetaData (4 bytes, little endian)
  5. Parquet Magic Bytes (4 bytes)
┌──────────────────────┐
│                      │
│         ...          │
│                      │
│┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │
│     ColumnIndex     ◀│─ ─ ─
││    (Optional)     │ │     │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  │
│┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │     │ FileMetadata
│     OffsetIndex      │       contains embedded
││    (Optional)     │◀┼ ─   │ offsets to
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  │  │    ColumnIndex and
│╔═══════════════════╗ │     │ OffsetIndex
│║                   ║ │  │
│║                   ║ ┼ ─   │
│║   FileMetadata    ║ │
│║                   ║ ┼ ─ ─ ┘
│║                   ║ │
│╚═══════════════════╝ │
│┌───────────────────┐ │
││  metadata length  │ │ length of FileMetadata  (only)
│└───────────────────┘ │
│┌───────────────────┐ │
││      'PAR1'       │ │ Parquet Magic Bytes
│└───────────────────┘ │
└──────────────────────┘
     Output Buffer

§Example

// write parquet metadata to an in-memory buffer
let mut buffer = vec![];
let metadata: ParquetMetaData = get_metadata();
let writer = ParquetMetaDataWriter::new(&mut buffer, &metadata);
// write the metadata to the buffer
writer.finish().unwrap();
assert!(!buffer.is_empty());

Fields§

§buf: TrackedWrite<W>§metadata: &'a ParquetMetaData

Implementations§

Source§

impl<'a, W: Write> ParquetMetaDataWriter<'a, W>

Source

pub fn new(buf: W, metadata: &'a ParquetMetaData) -> Self

Create a new ParquetMetaDataWriter to write to buf

Note any embedded offsets in the metadata will be written assuming the metadata is at the start of the buffer. If the metadata is being written to a location other than the start of the buffer, see Self::new_with_tracked

See example on the struct level documentation

Source

pub fn new_with_tracked( buf: TrackedWrite<W>, metadata: &'a ParquetMetaData, ) -> Self

Create a new ParquetMetaDataWriter to write to buf

This method is used when the metadata is being written to a location other than the start of the buffer.

See example on the struct level documentation

Source

pub fn finish(self) -> Result<()>

Write the metadata to the buffer

Source

fn convert_column_indexes(&self) -> Vec<Vec<Option<ColumnIndex>>>

Source

fn convert_offset_index(&self) -> Vec<Vec<Option<OffsetIndex>>>

Auto Trait Implementations§

§

impl<'a, W> Freeze for ParquetMetaDataWriter<'a, W>
where W: Freeze,

§

impl<'a, W> RefUnwindSafe for ParquetMetaDataWriter<'a, W>
where W: RefUnwindSafe,

§

impl<'a, W> Send for ParquetMetaDataWriter<'a, W>
where W: Send,

§

impl<'a, W> Sync for ParquetMetaDataWriter<'a, W>
where W: Sync,

§

impl<'a, W> Unpin for ParquetMetaDataWriter<'a, W>
where W: Unpin,

§

impl<'a, W> UnwindSafe for ParquetMetaDataWriter<'a, W>
where W: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> MaybeSendSync for T