ReaderBuilder

Struct ReaderBuilder 

Source
pub struct ReaderBuilder {
    batch_size: usize,
    strict_mode: bool,
    utf8_view: bool,
    reader_schema: Option<AvroSchema>,
    writer_schema_store: Option<SchemaStore>,
    active_fingerprint: Option<Fingerprint>,
}
Expand description

A builder that configures and constructs Avro readers and decoders.

ReaderBuilder is the primary entry point for this module. It supports:

  • OCF reading via Self::build, returning a Reader over any BufRead;
  • streaming decoding via Self::build_decoder, returning a Decoder.

§Options

  • batch_size: Max rows per RecordBatch (default: 1024). See Self::with_batch_size.
  • utf8_view: Use Arrow StringViewArray for string columns (default: false). See Self::with_utf8_view.
  • strict_mode: Opt‑in to stricter union handling (default: false). See Self::with_strict_mode.
  • reader_schema: Optional reader schema (projection / evolution) used when decoding values (default: None). See Self::with_reader_schema.
  • writer_schema_store: Required for building a Decoder for single‑object or Confluent framing. Maps fingerprints to Avro schemas. See Self::with_writer_schema_store.
  • active_fingerprint: Optional starting fingerprint for streaming decode when the first frame omits one (rare). See Self::with_active_fingerprint.

§Examples

Read an OCF file in batches of 4096 rows:

use std::fs::File;
use std::io::BufReader;
use arrow_avro::reader::ReaderBuilder;

let file = File::open("data.avro")?;
let mut reader = ReaderBuilder::new()
    .with_batch_size(4096)
    .build(BufReader::new(file))?;

Build a Decoder for Confluent messages:

use arrow_avro::schema::{AvroSchema, SchemaStore, Fingerprint, FingerprintAlgorithm};
use arrow_avro::reader::ReaderBuilder;

let mut store = SchemaStore::new_with_type(FingerprintAlgorithm::None);
store.set(Fingerprint::Id(1234), AvroSchema::new(r#"{"type":"record","name":"E","fields":[]}"#.to_string()))?;

let decoder = ReaderBuilder::new()
    .with_writer_schema_store(store)
    .build_decoder()?;

Fields§

§batch_size: usize§strict_mode: bool§utf8_view: bool§reader_schema: Option<AvroSchema>§writer_schema_store: Option<SchemaStore>§active_fingerprint: Option<Fingerprint>

Implementations§

Source§

impl ReaderBuilder

Source

pub fn new() -> Self

Creates a new ReaderBuilder with defaults:

  • batch_size = 1024
  • strict_mode = false
  • utf8_view = false
  • reader_schema = None
  • writer_schema_store = None
  • active_fingerprint = None
Source

fn make_record_decoder( &self, writer_schema: &Schema<'_>, reader_schema: Option<&Schema<'_>>, ) -> Result<RecordDecoder, ArrowError>

Source

fn make_record_decoder_from_schemas( &self, writer_schema: &Schema<'_>, reader_schema: Option<&AvroSchema>, ) -> Result<RecordDecoder, ArrowError>

Source

fn make_decoder_with_parts( &self, active_decoder: RecordDecoder, active_fingerprint: Option<Fingerprint>, cache: IndexMap<Fingerprint, RecordDecoder>, fingerprint_algorithm: FingerprintAlgorithm, ) -> Decoder

Source

fn make_decoder( &self, header: Option<&Header>, reader_schema: Option<&AvroSchema>, ) -> Result<Decoder, ArrowError>

Source

pub fn with_batch_size(self, batch_size: usize) -> Self

Sets the row‑based batch size.

Each call to Decoder::flush or each iteration of Reader yields a batch with up to this many rows. Larger batches can reduce overhead; smaller batches can reduce peak memory usage and latency.

Source

pub fn with_utf8_view(self, utf8_view: bool) -> Self

Choose Arrow’s StringViewArray for UTF‑8 string data.

When enabled, textual Avro fields are loaded into Arrow’s StringViewArray instead of the standard StringArray. This can improve performance for workloads with many short strings by reducing allocations.

Source

pub fn use_utf8view(&self) -> bool

Returns whether StringViewArray is enabled for string data.

Source

pub fn with_strict_mode(self, strict_mode: bool) -> Self

Enable stricter behavior for certain Avro unions (e.g., [T, "null"]).

When true, ambiguous or lossy unions that would otherwise be coerced may instead produce a descriptive error. Use this to catch schema issues early during ingestion.

Source

pub fn with_reader_schema(self, schema: AvroSchema) -> Self

Sets the reader schema used during decoding.

If not provided, the writer schema from the OCF header (for Reader) or the schema looked up from the fingerprint (for Decoder) is used directly.

A reader schema can be used for schema evolution or projection.

Source

pub fn with_writer_schema_store(self, store: SchemaStore) -> Self

Sets the SchemaStore used to resolve writer schemas by fingerprint.

This is required when building a Decoder for single‑object encoding or the Confluent wire format. The store maps a fingerprint (Rabin / MD5 / SHA‑256 / ID) to a full Avro schema.

Defaults to None.

Source

pub fn with_active_fingerprint(self, fp: Fingerprint) -> Self

Sets the initial schema fingerprint for stream decoding.

This can be useful for streams that do not include a fingerprint before the first record body (uncommon). If not set, the first observed fingerprint is used.

Source

pub fn build<R: BufRead>(self, reader: R) -> Result<Reader<R>, ArrowError>

Build a Reader (OCF) from this builder and a BufRead.

This reads and validates the OCF header, initializes an internal row decoder from the discovered writer (and optional reader) schema, and prepares to iterate blocks, decompressing if necessary.

Source

pub fn build_decoder(self) -> Result<Decoder, ArrowError>

Build a streaming Decoder from this builder.

§Requirements
  • SchemaStore must be provided via Self::with_writer_schema_store.
  • The store should contain all fingerprints that may appear on the stream.
§Errors
  • Returns [ArrowError::InvalidArgumentError] if the schema store is missing

Trait Implementations§

Source§

impl Debug for ReaderBuilder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ReaderBuilder

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,