Skip to main content

Module writer

Module writer 

Source
Expand description

Core functionality for writing Arrow arrays as Avro data

Implements the primary writer interface and record encoding logic. Avro writer implementation for the arrow-avro crate.

§Overview

Use this module to serialize Arrow [arrow_array::RecordBatch] values into Avro. Three output modes are supported:

  • crate::writer::AvroWriter — writes an Object Container File (OCF): a self‑describing file with header (schema JSON and metadata), optional compression, data blocks, and sync markers. See Avro 1.11.1 “Object Container Files.” https://avro.apache.org/docs/1.11.1/specification/#object-container-files

  • crate::writer::AvroStreamWriter — writes a Single Object Encoding (SOE) Stream without any container framing. This is useful when the schema is known out‑of‑band (i.e., via a registry) and you want minimal overhead.

  • crate::writer::Encoder — a row-by-row encoder that buffers encoded records into a single contiguous byte buffer and returns per-row [bytes::Bytes] slices. Ideal for publishing individual messages to Kafka, Pulsar, or other message queues where each message must be a self-contained Avro payload.

§Which writer should you use?

Use CaseRecommended Type
Write an OCF file to diskcrate::writer::AvroWriter
Stream records continuously to a file/socketcrate::writer::AvroStreamWriter
Publish individual records to Kafka/Pulsarcrate::writer::Encoder
Need per-row byte slices for custom framingcrate::writer::Encoder

§Per-Record Prefix Formats

For crate::writer::AvroStreamWriter and crate::writer::Encoder, each record is automatically prefixed based on the fingerprint strategy:

StrategyPrefixUse Case
FingerprintStrategy::Rabin (default)0xC3 0x01 + 8-byte LE Rabin fingerprintStandard Avro SOE
FingerprintStrategy::Id(id)0x00 + 4-byte BE schema IDConfluent Schema Registry
FingerprintStrategy::Id64(id)0x00 + 8-byte BE schema IDApicurio Registry

§Choosing the Avro Schema

By default, the writer converts your Arrow schema to Avro (including a top‑level record name). If you already have an Avro schema JSON you want to use verbatim, put it into the Arrow schema metadata under the SCHEMA_METADATA_KEY key before constructing the writer. The builder will use that schema instead of generating a new one.

§Compression

For OCF (crate::writer::AvroWriter), you may enable a compression codec via crate::writer::WriterBuilder::with_compression. The chosen codec is written into the file header and used for subsequent blocks. SOE stream writing (crate::writer::AvroStreamWriter, crate::writer::Encoder) does not apply container‑level compression.

§Examples

§Writing an OCF File

use std::sync::Arc;
use arrow_array::{ArrayRef, Int64Array, StringArray, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
use arrow_avro::writer::AvroWriter;

let schema = Schema::new(vec![
    Field::new("id", DataType::Int64, false),
    Field::new("name", DataType::Utf8, false),
]);

let batch = RecordBatch::try_new(
    Arc::new(schema.clone()),
    vec![
        Arc::new(Int64Array::from(vec![1, 2])) as ArrayRef,
        Arc::new(StringArray::from(vec!["alice", "bob"])) as ArrayRef,
    ],
)?;

let mut writer = AvroWriter::new(Vec::<u8>::new(), schema)?;
writer.write(&batch)?;
writer.finish()?;
let bytes = writer.into_inner();
assert!(!bytes.is_empty());

§Using the Row-by-Row Encoder for Message Queues

use std::sync::Arc;
use arrow_array::{ArrayRef, Int32Array, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
use arrow_avro::writer::{WriterBuilder, format::AvroSoeFormat};
use arrow_avro::schema::FingerprintStrategy;

let schema = Schema::new(vec![Field::new("x", DataType::Int32, false)]);
let batch = RecordBatch::try_new(
    Arc::new(schema.clone()),
    vec![Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef],
)?;

// Build an Encoder with Confluent wire format (schema ID = 42)
let mut encoder = WriterBuilder::new(schema)
    .with_fingerprint_strategy(FingerprintStrategy::Id(42))
    .build_encoder::<AvroSoeFormat>()?;

encoder.encode(&batch)?;

// Get the buffered rows (zero-copy views into a single backing buffer)
let rows = encoder.flush();
assert_eq!(rows.len(), 3);

// Each row has Confluent wire format: magic byte + 4-byte schema ID + body
for row in rows.iter() {
    assert_eq!(row[0], 0x00); // Confluent magic byte
}

Modules§

encoder 🔒
Encodes RecordBatch into the Avro binary format. Avro Encoder for Arrow types.
format
Logic for different Avro container file formats. Avro Writer Formats for Arrow.

Structs§

EncodedRows
A contiguous set of Avro encoded rows.
Encoder
A row-by-row encoder for Avro stream/message formats (SOE / registry wire formats / raw binary).
Writer
Generic Avro writer.
WriterBuilder
Builder to configure and create a Writer.

Type Aliases§

AvroStreamWriter
Alias for an Avro Single Object Encoding stream writer.
AvroWriter
Alias for an Avro Object Container File writer.