Expand description
Core functionality for writing Arrow arrays as Avro data
Implements the primary writer interface and record encoding logic.
Avro writer implementation for the arrow-avro
crate.
§Overview
Use this module to serialize Arrow RecordBatch
values into Avro. Two output
formats are supported:
AvroWriter
— writes an Object Container File (OCF): a self‑describing file with header (schema JSON + metadata), optional compression, data blocks, and sync markers. See Avro 1.11.1 “Object Container Files.” https://avro.apache.org/docs/1.11.1/specification/#object-container-filesAvroStreamWriter
— writes a raw Avro binary stream (“datum” bytes) without any container framing. This is useful when the schema is known out‑of‑band (i.e., via a registry) and you want minimal overhead.
§Which format should I use?
- Use OCF when you need a portable, self‑contained file. The schema travels with the data, making it easy to read elsewhere.
- Use the raw stream when your surrounding protocol supplies schema information
(i.e., a schema registry). If you need single‑object encoding (SOE) or Confluent
Schema Registry framing, you must add the appropriate prefix outside this writer:
- SOE:
0xC3 0x01
+ 8‑byte little‑endian CRC‑64‑AVRO fingerprint + Avro body (see Avro 1.11.1 “Single object encoding”). https://avro.apache.org/docs/1.11.1/specification/#single-object-encoding - Confluent wire format: magic
0x00
+ big‑endian 4‑byte schema ID and Avro body. https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format
- SOE:
§Choosing the Avro schema
By default, the writer converts your Arrow schema to Avro (including a top‑level record
name) and stores the resulting JSON under the avro::schema
metadata key. If you already
have an Avro schema JSON, you want to use verbatim, put it into the Arrow schema metadata
under the same key before constructing the writer. The builder will pick it up.
§Compression
For OCF, you may enable a compression codec via WriterBuilder::with_compression
. The
chosen codec is written into the file header and used for subsequent blocks. Raw stream
writing doesn’t apply container‑level compression.
Modules§
- encoder
- Encodes
RecordBatch
into the Avro binary format. Avro Encoder for Arrow types. - format
- Logic for different Avro container file formats.
Structs§
- Writer
- Generic Avro writer.
- Writer
Builder - Builder to configure and create a
Writer
.
Type Aliases§
- Avro
Stream Writer - Alias for a raw Avro binary stream writer.
- Avro
Writer - Alias for an Avro Object Container File writer.