pub struct ReaderBuilder {
batch_size: usize,
strict_mode: bool,
utf8_view: bool,
reader_schema: Option<AvroSchema>,
writer_schema_store: Option<SchemaStore>,
active_fingerprint: Option<Fingerprint>,
}
Expand description
A builder that configures and constructs Avro readers and decoders.
ReaderBuilder
is the primary entry point for this module. It supports:
- OCF reading via
Self::build
, returning aReader
over anyBufRead
; - streaming decoding via
Self::build_decoder
, returning aDecoder
.
§Options
batch_size
: Max rows perRecordBatch
(default:1024
). SeeSelf::with_batch_size
.utf8_view
: Use ArrowStringViewArray
for string columns (default:false
). SeeSelf::with_utf8_view
.strict_mode
: Opt‑in to stricter union handling (default:false
). SeeSelf::with_strict_mode
.reader_schema
: Optional reader schema (projection / evolution) used when decoding values (default:None
). SeeSelf::with_reader_schema
.writer_schema_store
: Required for building aDecoder
for single‑object or Confluent framing. Maps fingerprints to Avro schemas. SeeSelf::with_writer_schema_store
.active_fingerprint
: Optional starting fingerprint for streaming decode when the first frame omits one (rare). SeeSelf::with_active_fingerprint
.
§Examples
Read an OCF file in batches of 4096 rows:
use std::fs::File;
use std::io::BufReader;
use arrow_avro::reader::ReaderBuilder;
let file = File::open("data.avro")?;
let mut reader = ReaderBuilder::new()
.with_batch_size(4096)
.build(BufReader::new(file))?;
Build a Decoder
for Confluent messages:
use arrow_avro::schema::{AvroSchema, SchemaStore, Fingerprint, FingerprintAlgorithm};
use arrow_avro::reader::ReaderBuilder;
let mut store = SchemaStore::new_with_type(FingerprintAlgorithm::None);
store.set(Fingerprint::Id(1234), AvroSchema::new(r#"{"type":"record","name":"E","fields":[]}"#.to_string()))?;
let decoder = ReaderBuilder::new()
.with_writer_schema_store(store)
.build_decoder()?;
Fields§
§batch_size: usize
§strict_mode: bool
§utf8_view: bool
§reader_schema: Option<AvroSchema>
§writer_schema_store: Option<SchemaStore>
§active_fingerprint: Option<Fingerprint>
Implementations§
Source§impl ReaderBuilder
impl ReaderBuilder
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new ReaderBuilder
with defaults:
batch_size = 1024
strict_mode = false
utf8_view = false
reader_schema = None
writer_schema_store = None
active_fingerprint = None
fn make_record_decoder( &self, writer_schema: &Schema<'_>, reader_schema: Option<&Schema<'_>>, ) -> Result<RecordDecoder, ArrowError>
fn make_record_decoder_from_schemas( &self, writer_schema: &Schema<'_>, reader_schema: Option<&AvroSchema>, ) -> Result<RecordDecoder, ArrowError>
fn make_decoder_with_parts( &self, active_decoder: RecordDecoder, active_fingerprint: Option<Fingerprint>, cache: IndexMap<Fingerprint, RecordDecoder>, fingerprint_algorithm: FingerprintAlgorithm, ) -> Decoder
fn make_decoder( &self, header: Option<&Header>, reader_schema: Option<&AvroSchema>, ) -> Result<Decoder, ArrowError>
Sourcepub fn with_batch_size(self, batch_size: usize) -> Self
pub fn with_batch_size(self, batch_size: usize) -> Self
Sets the row‑based batch size.
Each call to Decoder::flush
or each iteration of Reader
yields a batch with
up to this many rows. Larger batches can reduce overhead; smaller batches can
reduce peak memory usage and latency.
Sourcepub fn with_utf8_view(self, utf8_view: bool) -> Self
pub fn with_utf8_view(self, utf8_view: bool) -> Self
Choose Arrow’s StringViewArray
for UTF‑8 string data.
When enabled, textual Avro fields are loaded into Arrow’s StringViewArray
instead of the standard StringArray
. This can improve performance for workloads
with many short strings by reducing allocations.
Sourcepub fn use_utf8view(&self) -> bool
pub fn use_utf8view(&self) -> bool
Returns whether StringViewArray
is enabled for string data.
Sourcepub fn with_strict_mode(self, strict_mode: bool) -> Self
pub fn with_strict_mode(self, strict_mode: bool) -> Self
Enable stricter behavior for certain Avro unions (e.g., [T, "null"]
).
When true
, ambiguous or lossy unions that would otherwise be coerced may instead
produce a descriptive error. Use this to catch schema issues early during ingestion.
Sourcepub fn with_reader_schema(self, schema: AvroSchema) -> Self
pub fn with_reader_schema(self, schema: AvroSchema) -> Self
Sets the reader schema used during decoding.
If not provided, the writer schema from the OCF header (for Reader
) or the
schema looked up from the fingerprint (for Decoder
) is used directly.
A reader schema can be used for schema evolution or projection.
Sourcepub fn with_writer_schema_store(self, store: SchemaStore) -> Self
pub fn with_writer_schema_store(self, store: SchemaStore) -> Self
Sets the SchemaStore
used to resolve writer schemas by fingerprint.
This is required when building a Decoder
for single‑object encoding or the
Confluent wire format. The store maps a fingerprint (Rabin / MD5 / SHA‑256 /
ID) to a full Avro schema.
Defaults to None
.
Sourcepub fn with_active_fingerprint(self, fp: Fingerprint) -> Self
pub fn with_active_fingerprint(self, fp: Fingerprint) -> Self
Sets the initial schema fingerprint for stream decoding.
This can be useful for streams that do not include a fingerprint before the first record body (uncommon). If not set, the first observed fingerprint is used.
Sourcepub fn build<R: BufRead>(self, reader: R) -> Result<Reader<R>, ArrowError>
pub fn build<R: BufRead>(self, reader: R) -> Result<Reader<R>, ArrowError>
Build a Reader
(OCF) from this builder and a BufRead
.
This reads and validates the OCF header, initializes an internal row decoder from the discovered writer (and optional reader) schema, and prepares to iterate blocks, decompressing if necessary.