pub struct ReaderBuilder {
batch_size: usize,
strict_mode: bool,
utf8_view: bool,
reader_schema: Option<AvroSchema>,
writer_schema_store: Option<SchemaStore>,
active_fingerprint: Option<Fingerprint>,
}Expand description
A builder that configures and constructs Avro readers and decoders.
ReaderBuilder is the primary entry point for this module. It supports:
- OCF reading via
Self::build, returning aReaderover anyBufRead; - streaming decoding via
Self::build_decoder, returning aDecoder.
§Options
batch_size: Max rows perRecordBatch(default:1024). SeeSelf::with_batch_size.utf8_view: Use ArrowStringViewArrayfor string columns (default:false). SeeSelf::with_utf8_view.strict_mode: Opt‑in to stricter union handling (default:false). SeeSelf::with_strict_mode.reader_schema: Optional reader schema (projection / evolution) used when decoding values (default:None). SeeSelf::with_reader_schema.writer_schema_store: Required for building aDecoderfor single‑object or Confluent framing. Maps fingerprints to Avro schemas. SeeSelf::with_writer_schema_store.active_fingerprint: Optional starting fingerprint for streaming decode when the first frame omits one (rare). SeeSelf::with_active_fingerprint.
§Examples
Read an OCF file in batches of 4096 rows:
use std::fs::File;
use std::io::BufReader;
use arrow_avro::reader::ReaderBuilder;
let file = File::open("data.avro")?;
let mut reader = ReaderBuilder::new()
.with_batch_size(4096)
.build(BufReader::new(file))?;Build a Decoder for Confluent messages:
use arrow_avro::schema::{AvroSchema, SchemaStore, Fingerprint, FingerprintAlgorithm};
use arrow_avro::reader::ReaderBuilder;
let mut store = SchemaStore::new_with_type(FingerprintAlgorithm::Id);
store.set(Fingerprint::Id(1234), AvroSchema::new(r#"{"type":"record","name":"E","fields":[]}"#.to_string()))?;
let decoder = ReaderBuilder::new()
.with_writer_schema_store(store)
.build_decoder()?;Fields§
§batch_size: usize§strict_mode: bool§utf8_view: bool§reader_schema: Option<AvroSchema>§writer_schema_store: Option<SchemaStore>§active_fingerprint: Option<Fingerprint>Implementations§
Source§impl ReaderBuilder
impl ReaderBuilder
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new ReaderBuilder with defaults:
batch_size = 1024strict_mode = falseutf8_view = falsereader_schema = Nonewriter_schema_store = Noneactive_fingerprint = None
fn make_record_decoder( &self, writer_schema: &Schema<'_>, reader_schema: Option<&Schema<'_>>, ) -> Result<RecordDecoder, ArrowError>
fn make_record_decoder_from_schemas( &self, writer_schema: &Schema<'_>, reader_schema: Option<&AvroSchema>, ) -> Result<RecordDecoder, ArrowError>
fn make_decoder_with_parts( &self, active_decoder: RecordDecoder, active_fingerprint: Option<Fingerprint>, cache: IndexMap<Fingerprint, RecordDecoder>, fingerprint_algorithm: FingerprintAlgorithm, ) -> Decoder
fn make_decoder( &self, header: Option<&Header>, reader_schema: Option<&AvroSchema>, ) -> Result<Decoder, ArrowError>
Sourcepub fn with_batch_size(self, batch_size: usize) -> Self
pub fn with_batch_size(self, batch_size: usize) -> Self
Sets the row‑based batch size.
Each call to Decoder::flush or each iteration of Reader yields a batch with
up to this many rows. Larger batches can reduce overhead; smaller batches can
reduce peak memory usage and latency.
Sourcepub fn with_utf8_view(self, utf8_view: bool) -> Self
pub fn with_utf8_view(self, utf8_view: bool) -> Self
Choose Arrow’s StringViewArray for UTF‑8 string data.
When enabled, textual Avro fields are loaded into Arrow’s StringViewArray
instead of the standard StringArray. This can improve performance for workloads
with many short strings by reducing allocations.
Sourcepub fn use_utf8view(&self) -> bool
pub fn use_utf8view(&self) -> bool
Returns whether StringViewArray is enabled for string data.
Sourcepub fn with_strict_mode(self, strict_mode: bool) -> Self
pub fn with_strict_mode(self, strict_mode: bool) -> Self
Enable stricter behavior for certain Avro unions (e.g., [T, "null"]).
When true, ambiguous or lossy unions that would otherwise be coerced may instead
produce a descriptive error. Use this to catch schema issues early during ingestion.
Sourcepub fn with_reader_schema(self, schema: AvroSchema) -> Self
pub fn with_reader_schema(self, schema: AvroSchema) -> Self
Sets the reader schema used during decoding.
If not provided, the writer schema from the OCF header (for Reader) or the
schema looked up from the fingerprint (for Decoder) is used directly.
A reader schema can be used for schema evolution or projection.
Sourcepub fn with_writer_schema_store(self, store: SchemaStore) -> Self
pub fn with_writer_schema_store(self, store: SchemaStore) -> Self
Sets the SchemaStore used to resolve writer schemas by fingerprint.
This is required when building a Decoder for single‑object encoding or the
Confluent wire format. The store maps a fingerprint (Rabin / MD5 / SHA‑256 /
ID) to a full Avro schema.
Defaults to None.
Sourcepub fn with_active_fingerprint(self, fp: Fingerprint) -> Self
pub fn with_active_fingerprint(self, fp: Fingerprint) -> Self
Sets the initial schema fingerprint for stream decoding.
This can be useful for streams that do not include a fingerprint before the first record body (uncommon). If not set, the first observed fingerprint is used.
Sourcepub fn build<R: BufRead>(self, reader: R) -> Result<Reader<R>, ArrowError>
pub fn build<R: BufRead>(self, reader: R) -> Result<Reader<R>, ArrowError>
Build a Reader (OCF) from this builder and a BufRead.
This reads and validates the OCF header, initializes an internal row decoder from the discovered writer (and optional reader) schema, and prepares to iterate blocks, decompressing if necessary.