Expand description
Core functionality for reading Avro data into Arrow arrays
Implements the primary reader interface and record decoding logic. Avro reader
This module provides facilities to read Apache Avro-encoded files or streams
into Arrow’s RecordBatch
format. In particular, it introduces:
ReaderBuilder
: Configures Avro reading, e.g., batch sizeReader
: YieldsRecordBatch
values, implementingIterator
Decoder
: A low-level push-based decoder for Avro records
§Basic Usage
Reader
can be used directly with synchronous data sources, such as std::fs::File
.
§Reading a Single Batch
let file = File::open(path).unwrap();
let mut avro = ReaderBuilder::new().build(BufReader::new(file)).unwrap();
let batch = avro.next().unwrap();
§Async Usage
The lower-level Decoder
can be integrated with various forms of async data streams,
and is designed to be agnostic to different async IO primitives within
the Rust ecosystem. It works by incrementally decoding Avro data from byte slices.
For example, see below for how it could be used with an arbitrary Stream
of Bytes
:
fn decode_stream<S: Stream<Item = Bytes> + Unpin>(
mut decoder: Decoder,
mut input: S,
) -> impl Stream<Item = Result<RecordBatch, ArrowError>> {
let mut buffered = Bytes::new();
futures::stream::poll_fn(move |cx| {
loop {
if buffered.is_empty() {
buffered = match ready!(input.poll_next_unpin(cx)) {
Some(b) => b,
None => break,
};
}
let decoded = match decoder.decode(buffered.as_ref()) {
Ok(decoded) => decoded,
Err(e) => return Poll::Ready(Some(Err(e))),
};
let read = buffered.len();
buffered.advance(decoded);
if decoded != read {
break
}
}
// Convert any fully-decoded rows to a RecordBatch, if available
Poll::Ready(decoder.flush().transpose())
})
}
Modules§
Structs§
- Decoder
- A low-level interface for decoding Avro-encoded bytes into Arrow
RecordBatch
. - Reader
- A high-level Avro
Reader
that reads container-file blocks and feeds them into a row-levelDecoder
. - Reader
Builder - A builder to create an
Avro Reader
that reads Avro data into ArrowRecordBatch
.
Functions§
- read_
header 🔒 - Read the Avro file header (magic, metadata, sync marker) from
reader
.