Expand description
Core functionality for writing Arrow arrays as Avro data
Implements the primary writer interface and record encoding logic.
Avro writer implementation for the arrow-avro crate.
§Overview
Use this module to serialize Arrow [arrow_array::RecordBatch] values into Avro. Three output
modes are supported:
-
crate::writer::AvroWriter— writes an Object Container File (OCF): a self‑describing file with header (schema JSON and metadata), optional compression, data blocks, and sync markers. See Avro 1.11.1 “Object Container Files.” https://avro.apache.org/docs/1.11.1/specification/#object-container-files -
crate::writer::AvroStreamWriter— writes a Single Object Encoding (SOE) Stream without any container framing. This is useful when the schema is known out‑of‑band (i.e., via a registry) and you want minimal overhead. -
crate::writer::Encoder— a row-by-row encoder that buffers encoded records into a single contiguous byte buffer and returns per-row [bytes::Bytes] slices. Ideal for publishing individual messages to Kafka, Pulsar, or other message queues where each message must be a self-contained Avro payload.
§Which writer should you use?
| Use Case | Recommended Type |
|---|---|
| Write an OCF file to disk | crate::writer::AvroWriter |
| Stream records continuously to a file/socket | crate::writer::AvroStreamWriter |
| Publish individual records to Kafka/Pulsar | crate::writer::Encoder |
| Need per-row byte slices for custom framing | crate::writer::Encoder |
§Per-Record Prefix Formats
For crate::writer::AvroStreamWriter and crate::writer::Encoder, each record is automatically prefixed
based on the fingerprint strategy:
| Strategy | Prefix | Use Case |
|---|---|---|
FingerprintStrategy::Rabin (default) | 0xC3 0x01 + 8-byte LE Rabin fingerprint | Standard Avro SOE |
FingerprintStrategy::Id(id) | 0x00 + 4-byte BE schema ID | Confluent Schema Registry |
FingerprintStrategy::Id64(id) | 0x00 + 8-byte BE schema ID | Apicurio Registry |
§Choosing the Avro Schema
By default, the writer converts your Arrow schema to Avro (including a top‑level record
name). If you already have an Avro schema JSON you want to use verbatim, put it into the
Arrow schema metadata under the SCHEMA_METADATA_KEY
key before constructing the writer. The builder will use that schema instead of generating
a new one.
§Compression
For OCF (crate::writer::AvroWriter), you may enable a compression codec via
crate::writer::WriterBuilder::with_compression. The chosen codec is written into the file header
and used for subsequent blocks. SOE stream writing (crate::writer::AvroStreamWriter, crate::writer::Encoder)
does not apply container‑level compression.
§Examples
§Writing an OCF File
use std::sync::Arc;
use arrow_array::{ArrayRef, Int64Array, StringArray, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
use arrow_avro::writer::AvroWriter;
let schema = Schema::new(vec![
Field::new("id", DataType::Int64, false),
Field::new("name", DataType::Utf8, false),
]);
let batch = RecordBatch::try_new(
Arc::new(schema.clone()),
vec![
Arc::new(Int64Array::from(vec![1, 2])) as ArrayRef,
Arc::new(StringArray::from(vec!["alice", "bob"])) as ArrayRef,
],
)?;
let mut writer = AvroWriter::new(Vec::<u8>::new(), schema)?;
writer.write(&batch)?;
writer.finish()?;
let bytes = writer.into_inner();
assert!(!bytes.is_empty());§Using the Row-by-Row Encoder for Message Queues
use std::sync::Arc;
use arrow_array::{ArrayRef, Int32Array, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
use arrow_avro::writer::{WriterBuilder, format::AvroSoeFormat};
use arrow_avro::schema::FingerprintStrategy;
let schema = Schema::new(vec![Field::new("x", DataType::Int32, false)]);
let batch = RecordBatch::try_new(
Arc::new(schema.clone()),
vec![Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef],
)?;
// Build an Encoder with Confluent wire format (schema ID = 42)
let mut encoder = WriterBuilder::new(schema)
.with_fingerprint_strategy(FingerprintStrategy::Id(42))
.build_encoder::<AvroSoeFormat>()?;
encoder.encode(&batch)?;
// Get the buffered rows (zero-copy views into a single backing buffer)
let rows = encoder.flush();
assert_eq!(rows.len(), 3);
// Each row has Confluent wire format: magic byte + 4-byte schema ID + body
for row in rows.iter() {
assert_eq!(row[0], 0x00); // Confluent magic byte
}Modules§
- encoder 🔒
- Encodes
RecordBatchinto the Avro binary format. Avro Encoder for Arrow types. - format
- Logic for different Avro container file formats. Avro Writer Formats for Arrow.
Structs§
- Encoded
Rows - A contiguous set of Avro encoded rows.
- Encoder
- A row-by-row encoder for Avro stream/message formats (SOE / registry wire formats / raw binary).
- Writer
- Generic Avro writer.
- Writer
Builder - Builder to configure and create a
Writer.
Type Aliases§
- Avro
Stream Writer - Alias for an Avro Single Object Encoding stream writer.
- Avro
Writer - Alias for an Avro Object Container File writer.