Module arrow

Source
Expand description

API for reading/writing Arrow RecordBatches and Arrays to/from Parquet Files.

See the crate-level documentation for more details.

§Example of writing Arrow record batch to Parquet file

 let ids = Int32Array::from(vec![1, 2, 3, 4]);
 let vals = Int32Array::from(vec![5, 6, 7, 8]);
 let batch = RecordBatch::try_from_iter(vec![
   ("id", Arc::new(ids) as ArrayRef),
   ("val", Arc::new(vals) as ArrayRef),
 ]).unwrap();

 let file = tempfile().unwrap();

 // WriterProperties can be used to set Parquet file options
 let props = WriterProperties::builder()
     .set_compression(Compression::SNAPPY)
     .build();

 let mut writer = ArrowWriter::try_new(file, batch.schema(), Some(props)).unwrap();

 writer.write(&batch).expect("Writing batch");

 // writer must be closed to write footer
 writer.close().unwrap();

§Example of reading parquet file into arrow record batch

let file = File::open("data.parquet").unwrap();

let builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
println!("Converted arrow schema is: {}", builder.schema());

let mut reader = builder.build().unwrap();

let record_batch = reader.next().unwrap().unwrap();

println!("Read {} records.", record_batch.num_rows());

§Example of reading non-uniformly encrypted parquet file into arrow record batch

Note: This requires the experimental encryption feature to be enabled at compile time.

 let file = File::open(path).unwrap();

 // Define the AES encryption keys required required for decrypting the footer metadata
 // and column-specific data. If only a footer key is used then it is assumed that the
 // file uses uniform encryption and all columns are encrypted with the footer key.
 // If any column keys are specified, other columns without a key provided are assumed
 // to be unencrypted
 let footer_key = "0123456789012345".as_bytes(); // Keys are 128 bits (16 bytes)
 let column_1_key = "1234567890123450".as_bytes();
 let column_2_key = "1234567890123451".as_bytes();

 let decryption_properties = FileDecryptionProperties::builder(footer_key.to_vec())
     .with_column_key("double_field", column_1_key.to_vec())
     .with_column_key("float_field", column_2_key.to_vec())
     .build()
     .unwrap();

 let options = ArrowReaderOptions::default()
  .with_file_decryption_properties(decryption_properties);
 let reader_metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap();
 let file_metadata = reader_metadata.metadata().file_metadata();
 assert_eq!(50, file_metadata.num_rows());

 let mut reader = ParquetRecordBatchReaderBuilder::try_new_with_options(file, options)
   .unwrap()
   .build()
   .unwrap();

 let record_batch = reader.next().unwrap().unwrap();
 assert_eq!(50, record_batch.num_rows());

Re-exports§

pub use self::arrow_writer::ArrowWriter;
pub use self::async_reader::ParquetRecordBatchStreamBuilder;
pub use self::async_writer::AsyncArrowWriter;

Modules§

arrow_reader
Contains reader which reads parquet data into arrow [RecordBatch]
arrow_writer
Contains writer which writes arrow data into parquet data.
async_reader
async API for reading Parquet files as [RecordBatch]es
async_writer
async API for writing [RecordBatch]es to Parquet files
buffer 🔒
Logic for reading data into arrow buffers
decoder 🔒
Specialized decoders optimised for decoding to arrow format
record_reader 🔒

Structs§

ArrowSchemaConverter
Converter for Arrow schema to Parquet schema
FieldLevels
Schema information necessary to decode a parquet file as arrow [Fields]
ProjectionMask
A ProjectionMask identifies a set of columns within a potentially nested schema to project

Constants§

ARROW_SCHEMA_META_KEY
Schema metadata key used to store serialized Arrow IPC schema
PARQUET_FIELD_ID_META_KEY
The value of this metadata key, if present on Field::metadata, will be used to populate BasicTypeInfo::id

Functions§

add_encoded_arrow_schema_to_metadata
Mutates writer metadata by storing the encoded Arrow schema. If there is an existing Arrow schema metadata, it is replaced.
arrow_to_parquet_schemaDeprecated
Convert arrow schema to parquet schema
encode_arrow_schema
Encodes the Arrow schema into the IPC format, and base64 encodes it
parquet_column
Lookups up the parquet column by name
parquet_to_arrow_field_levels
Convert a parquet SchemaDescriptor to FieldLevels
parquet_to_arrow_schema
Convert Parquet schema to Arrow schema including optional metadata
parquet_to_arrow_schema_by_columns
Convert parquet schema to arrow schema including optional metadata, only preserving some leaf columns.