Expand description
Parquet metadata API
Most users should use these structures to interact with Parquet metadata. The crate::format module contains lower level structures generated from the Parquet thrift definition.
-
ParquetMetaData
: Top level metadata container, read from the Parquet file footer. -
FileMetaData
: File level metadata such as schema, row counts and version. -
RowGroupMetaData
: Metadata for each Row Group with a File, such as location and number of rows, and column chunks. -
ColumnChunkMetaData
: Metadata for each column chunk (primitive leaf) within a Row Group including encoding and compression information, number of values, statistics, etc.
§APIs for working with Parquet Metadata
The Parquet readers and writers in this crate handle reading and writing metadata into parquet files. To work with metadata directly, the following APIs are available:
ParquetMetaDataReader
for readingParquetMetaDataWriter
for writing.
§Examples
Please see external_metadata.rs
§Metadata Encodings and Structures
There are three different encodings of Parquet Metadata in this crate:
-
bytes
:encoded with the ThriftTCompactProtocol
as defined in parquet.thrift -
format
: Rust structures automatically generated by the thrift compiler from parquet.thrift. These structures are low level and mirror the thrift definitions. -
file::metadata
(this module): Easier to use Rust structures with a more idiomatic API. Note that, confusingly, some but not all of these structures have the same name as theformat
structures.
Graphically, this is how the different structures relate to each other:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌──────────────┐ │ ┌───────────────────────┐ │
│ │ ColumnIndex │ ││ ParquetMetaData │
└──────────────┘ │ └───────────────────────┘ │
┌──────────────┐ │ ┌────────────────┐ │┌───────────────────────┐
│ ..0x24.. │ ◀────▶ │ OffsetIndex │ │ ◀────▶ │ ParquetMetaData │ │
└──────────────┘ │ └────────────────┘ │└───────────────────────┘
... │ ... │
│ ┌──────────────────┐ │ ┌──────────────────┐
bytes │ FileMetaData* │ │ │ FileMetaData* │ │
(thrift encoded) │ └──────────────────┘ │ └──────────────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
format::meta structures file::metadata structures
* Same name, different struct
Modules§
- memory 🔒Memory calculations for
ParquetMetadata::memory_size
- reader 🔒
- writer 🔒
Structs§
- Metadata for a column chunk.
- Builder for
ColumnChunkMetaData
- Builder for Parquet
ColumnIndex
, part of the Parquet PageIndex - File level metadata for a Parquet file.
- Histograms for repetition and definition levels.
- Builder for offset index, part of the Parquet PageIndex.
- Parsed metadata for a single Parquet file
- A builder for creating / manipulating
ParquetMetaData
- Reads the
ParquetMetaData
from a byte stream. - Writes
ParquetMetaData
to a byte stream - Metadata for a row group
- Builder for row group metadata.
Type Aliases§
- Reference counted pointer for
FileMetaData
. - A key-value pair for
FileMetaData
. - Page level statistics for each column chunk of each row group.
OffsetIndexMetaData
for each data page of each row group of each column- Reference counted pointer for
RowGroupMetaData
.