parquet

Module arrow

Source
Expand description

High-level API for reading/writing Arrow RecordBatches and Arrays to/from Parquet Files.

Apache Arrow is a cross-language development platform for in-memory data.

§Example of writing Arrow record batch to Parquet file

 let ids = Int32Array::from(vec![1, 2, 3, 4]);
 let vals = Int32Array::from(vec![5, 6, 7, 8]);
 let batch = RecordBatch::try_from_iter(vec![
   ("id", Arc::new(ids) as ArrayRef),
   ("val", Arc::new(vals) as ArrayRef),
 ]).unwrap();

 let file = tempfile().unwrap();

 // WriterProperties can be used to set Parquet file options
 let props = WriterProperties::builder()
     .set_compression(Compression::SNAPPY)
     .build();

 let mut writer = ArrowWriter::try_new(file, batch.schema(), Some(props)).unwrap();

 writer.write(&batch).expect("Writing batch");

 // writer must be closed to write footer
 writer.close().unwrap();

§Example of reading parquet file into arrow record batch

let file = File::open("data.parquet").unwrap();

let builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
println!("Converted arrow schema is: {}", builder.schema());

let mut reader = builder.build().unwrap();

let record_batch = reader.next().unwrap().unwrap();

println!("Read {} records.", record_batch.num_rows());

Re-exports§

Modules§

  • Contains reader which reads parquet data into arrow [RecordBatch]
  • Contains writer which writes arrow data into parquet data.
  • Provides async API for reading parquet files as [RecordBatch]es
  • Contains async writer which writes arrow data into parquet data.
  • buffer 🔒
    Logic for reading data into arrow buffers
  • decoder 🔒
    Specialized decoders optimised for decoding to arrow format

Structs§

Constants§

Functions§