Expand description
Parquet metadata API
Most users should use these structures to interact with Parquet metadata. The crate::format module contains lower level structures generated from the Parquet thrift definition.
-
ParquetMetaData
: Top level metadata container, read from the Parquet file footer. -
FileMetaData
: File level metadata such as schema, row counts and version. -
RowGroupMetaData
: Metadata for each Row Group with a File, such as location and number of rows, and column chunks. -
ColumnChunkMetaData
: Metadata for each column chunk (primitive leaf) within a Row Group including encoding and compression information, number of values, statistics, etc.
§APIs for working with Parquet Metadata
The Parquet readers and writers in this crate handle reading and writing metadata into parquet files. To work with metadata directly, the following APIs are available:
ParquetMetaDataReader
for readingParquetMetaDataWriter
for writing.
§Examples
Please see external_metadata.rs
§Metadata Encodings and Structures
There are three different encodings of Parquet Metadata in this crate:
-
bytes
:encoded with the ThriftTCompactProtocol
as defined in parquet.thrift -
format
: Rust structures automatically generated by the thrift compiler from parquet.thrift. These structures are low level and mirror the thrift definitions. -
file::metadata
(this module): Easier to use Rust structures with a more idiomatic API. Note that, confusingly, some but not all of these structures have the same name as theformat
structures.
Graphically, this is how the different structures relate to each other:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌──────────────┐ │ ┌───────────────────────┐ │
│ │ ColumnIndex │ ││ ParquetMetaData │
└──────────────┘ │ └───────────────────────┘ │
┌──────────────┐ │ ┌────────────────┐ │┌───────────────────────┐
│ ..0x24.. │ ◀────▶ │ OffsetIndex │ │ ◀────▶ │ ParquetMetaData │ │
└──────────────┘ │ └────────────────┘ │└───────────────────────┘
... │ ... │
│ ┌──────────────────┐ │ ┌──────────────────┐
bytes │ FileMetaData* │ │ │ FileMetaData* │ │
(thrift encoded) │ └──────────────────┘ │ └──────────────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
format::meta structures file::metadata structures
* Same name, different struct
Modules§
- memory 🔒
- Memory calculations for
ParquetMetadata::memory_size
- reader 🔒
- writer 🔒
Structs§
- Column
Chunk Meta Data - Metadata for a column chunk.
- Column
Chunk Meta Data Builder - Builder for
ColumnChunkMetaData
- Column
Index Builder - Builder for Parquet
ColumnIndex
, part of the Parquet PageIndex - File
Meta Data - File level metadata for a Parquet file.
- Level
Histogram - Histograms for repetition and definition levels.
- Offset
Index Builder - Builder for offset index, part of the Parquet PageIndex.
- Parquet
Meta Data - Parsed metadata for a single Parquet file
- Parquet
Meta Data Builder - A builder for creating / manipulating
ParquetMetaData
- Parquet
Meta Data Reader - Reads the
ParquetMetaData
from a byte stream. - Parquet
Meta Data Writer - Writes
ParquetMetaData
to a byte stream - RowGroup
Meta Data - Metadata for a row group
- RowGroup
Meta Data Builder - Builder for row group metadata.
Type Aliases§
- File
Meta Data Ptr - Reference counted pointer for
FileMetaData
. - KeyValue
- A key-value pair for
FileMetaData
. - Parquet
Column Index - Page level statistics for each column chunk of each row group.
- Parquet
Offset Index OffsetIndexMetaData
for each data page of each row group of each column- RowGroup
Meta Data Ptr - Reference counted pointer for
RowGroupMetaData
.