Expand description
High-level API for reading/writing Arrow RecordBatches and Arrays to/from Parquet Files.
Apache Arrow is a cross-language development platform for in-memory data.
§Example of writing Arrow record batch to Parquet file
let ids = Int32Array::from(vec![1, 2, 3, 4]);
let vals = Int32Array::from(vec![5, 6, 7, 8]);
let batch = RecordBatch::try_from_iter(vec![
("id", Arc::new(ids) as ArrayRef),
("val", Arc::new(vals) as ArrayRef),
]).unwrap();
let file = tempfile().unwrap();
// WriterProperties can be used to set Parquet file options
let props = WriterProperties::builder()
.set_compression(Compression::SNAPPY)
.build();
let mut writer = ArrowWriter::try_new(file, batch.schema(), Some(props)).unwrap();
writer.write(&batch).expect("Writing batch");
// writer must be closed to write footer
writer.close().unwrap();
§Example of reading parquet file into arrow record batch
let file = File::open("data.parquet").unwrap();
let builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
println!("Converted arrow schema is: {}", builder.schema());
let mut reader = builder.build().unwrap();
let record_batch = reader.next().unwrap().unwrap();
println!("Read {} records.", record_batch.num_rows());
Re-exports§
pub use self::arrow_writer::ArrowWriter;
pub use self::async_reader::ParquetRecordBatchStreamBuilder;
pub use self::async_writer::AsyncArrowWriter;
Modules§
- arrow_
reader - Contains reader which reads parquet data into arrow [
RecordBatch
] - arrow_
writer - Contains writer which writes arrow data into parquet data.
- async_
reader ParquetRecordBatchStreamBuilder
:async
API for reading Parquet files as [RecordBatch
]es- async_
writer - Contains async writer which writes arrow data into parquet data.
- buffer 🔒
- Logic for reading data into arrow buffers
- decoder 🔒
- Specialized decoders optimised for decoding to arrow format
- record_
reader 🔒
Structs§
- Arrow
Schema Converter - Converter for Arrow schema to Parquet schema
- Field
Levels - Schema information necessary to decode a parquet file as arrow [
Fields
] - Projection
Mask - A
ProjectionMask
identifies a set of columns within a potentially nested schema to project
Constants§
- ARROW_
SCHEMA_ META_ KEY - Schema metadata key used to store serialized Arrow IPC schema
- PARQUET_
FIELD_ ID_ META_ KEY - The value of this metadata key, if present on
Field::metadata
, will be used to populateBasicTypeInfo::id
Functions§
- add_
encoded_ arrow_ schema_ to_ metadata - Mutates writer metadata by storing the encoded Arrow schema. If there is an existing Arrow schema metadata, it is replaced.
- arrow_
to_ parquet_ schema Deprecated - Convert arrow schema to parquet schema
- encode_
arrow_ schema - Encodes the Arrow schema into the IPC format, and base64 encodes it
- parquet_
column - Lookups up the parquet column by name
- parquet_
to_ arrow_ field_ levels - Convert a parquet
SchemaDescriptor
toFieldLevels
- parquet_
to_ arrow_ schema - Convert Parquet schema to Arrow schema including optional metadata
- parquet_
to_ arrow_ schema_ by_ columns - Convert parquet schema to arrow schema including optional metadata, only preserving some leaf columns.