Expand description
Core functionality for writing Arrow arrays as Avro data
Implements the primary writer interface and record encoding logic.
Avro writer implementation for the arrow-avro crate.
§Overview
Use this module to serialize Arrow RecordBatch values into Avro. Two output
formats are supported:
AvroWriter— writes an Object Container File (OCF): a self‑describing file with header (schema JSON + metadata), optional compression, data blocks, and sync markers. See Avro 1.11.1 “Object Container Files.” https://avro.apache.org/docs/1.11.1/specification/#object-container-filesAvroStreamWriter— writes a Single Object Encoding (SOE) Stream (“datum” bytes) without any container framing. This is useful when the schema is known out‑of‑band (i.e., via a registry) and you want minimal overhead.
§Which format should you use?
- Use OCF when you need a portable, self‑contained file. The schema travels with the data, making it easy to read elsewhere.
- Use the SOE stream when your surrounding protocol supplies schema information
(i.e., a schema registry). The writer automatically adds the per‑record prefix:
- SOE: Each record is prefixed with the 2-byte header (
0xC3 0x01) followed by an 8‑byte little‑endian CRC‑64‑AVRO fingerprint, then the Avro body. See Avro 1.11.1 “Single object encoding”. https://avro.apache.org/docs/1.11.1/specification/#single-object-encoding - Confluent wire format: Each record is prefixed with magic byte
0x00followed by a big‑endian 4‑byte schema ID, then the Avro body. UseFingerprintStrategy::Id(schema_id). https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format - Apicurio wire format: Each record is prefixed with magic byte
0x00followed by a big‑endian 8‑byte schema ID, then the Avro body. UseFingerprintStrategy::Id64(schema_id). https://www.apicur.io/registry/docs/apicurio-registry/1.3.3.Final/getting-started/assembly-using-kafka-client-serdes.html#registry-serdes-types-avro-registry
- SOE: Each record is prefixed with the 2-byte header (
§Choosing the Avro schema
By default, the writer converts your Arrow schema to Avro (including a top‑level record
name). If you already have an Avro schema JSON you want to use verbatim, put it into the
Arrow schema metadata under the avro.schema key before constructing the writer. The
builder will use that schema instead of generating a new one (unless strip_metadata is
set to true in the options).
§Compression
For OCF, you may enable a compression codec via WriterBuilder::with_compression. The
chosen codec is written into the file header and used for subsequent blocks. SOE stream
writing doesn’t apply container‑level compression.
Modules§
- encoder 🔒
- Encodes
RecordBatchinto the Avro binary format. Avro Encoder for Arrow types. - format
- Logic for different Avro container file formats. Avro Writer Formats for Arrow.
Structs§
- Writer
- Generic Avro writer.
- Writer
Builder - Builder to configure and create a
Writer.
Type Aliases§
- Avro
Stream Writer - Alias for an Avro Single Object Encoding stream writer.
- Avro
Writer - Alias for an Avro Object Container File writer.