Expand description
⚠️ Experimental Support for reading and writing Variant
s to / from Parquet files ⚠️
This is a 🚧 Work In Progress
Note: Requires the variant_experimental
feature of the parquet
crate to be enabled.
§Features
- Representation of
Variant
, andVariantArray
for working with Variant values (see [parquet_variant
] for more details) - Kernels for working with arrays of Variant values
such as conversion between
Variant
and JSON, and shredding/unshredding (see [parquet_variant_compute
] for more details)
§Example: Writing a Parquet file with Variant column
// Use the VariantArrayBuilder to build a VariantArray
let mut builder = VariantArrayBuilder::new(3);
builder.new_object().with_field("name", "Alice").finish(); // row 1: {"name": "Alice"}
builder.append_value("such wow"); // row 2: "such wow" (a string)
let array = builder.build();
// Since VariantArray is an ExtensionType, it needs to be converted
// to an ArrayRef and Field with the appropriate metadata
// before it can be written to a Parquet file
let field = array.field("data");
let array = ArrayRef::from(array);
// create a RecordBatch with the VariantArray
let schema = Schema::new(vec![field]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?;
// Now you can write the RecordBatch to the Parquet file, as normal
let file = std::fs::File::create("variant.parquet")?;
let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
writer.write(&batch)?;
writer.close()?;
§Example: Writing JSON into a Parquet file with Variant column
// Create an array of JSON strings, simulating a column of JSON data
let input_array: ArrayRef = Arc::new(StringArray::from(vec![
Some(r#"{"name": "Alice", "age": 30}"#),
Some(r#"{"name": "Bob", "age": 25, "address": {"city": "New York"}}"#),
None,
Some("{}"),
]));
// Convert the JSON strings to a VariantArray
let array: VariantArray = json_to_variant(&input_array)?;
// create a RecordBatch with the VariantArray
let schema = Schema::new(vec![array.field("data")]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![ArrayRef::from(array)])?;
// write the RecordBatch to a Parquet file as normal
let file = std::fs::File::create("variant-json.parquet")?;
let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
writer.write(&batch)?;
writer.close()?;
§Example: Reading a Parquet file with Variant column
Use the VariantType
extension type to find the Variant column:
// Read the Parquet file using standard Arrow Parquet reader.
// Note this file has 2 columns: "id", "var", and the "var" column
let file = std::fs::File::open(file_path())?;
let mut reader = ArrowReaderBuilder::try_new(file)?.build()?;
// You can check if a column contains a Variant using
// the VariantType extension type
let schema = reader.schema();
let field = schema.field_with_name("var")?;
assert!(field.try_extension_type::<VariantType>().is_ok());
// The reader will yield RecordBatches with a StructArray
// to convert them to VariantArray, use VariantArray::try_new
let batch = reader.next().unwrap().unwrap();
let col = batch.column_by_name("var").unwrap();
let var_array = VariantArray::try_new(col)?;
assert_eq!(var_array.len(), 1);
let var_value: Variant = var_array.value(0);
assert_eq!(var_value, Variant::from("iceberg")); // the value in case-075.parquet
Structs§
- Borrowed
Shredding State - Similar to
ShreddingState
except it holds borrowed references of the target arrays. Useful for avoiding clone operations when the caller does not need a self-standing shredding state. - Cast
Options - Options for controlling the behavior of
cast_to_variant_with_options
. - GetOptions
- Controls the action of the variant_get kernel.
- List
Builder - A builder for creating
Variant::List
values. - List
State - Internal state for list building
- Object
Builder - A builder for creating
Variant::Object
values. - Object
Field Builder - A
VariantBuilderExt
that inserts a new field into a variant object. - Object
State - Internal state for object building
- Parent
State - Tracks information needed to correctly finalize a nested builder.
- Read
Only Metadata Builder - A metadata builder that cannot register new field names, and merely returns the field id associated with a known field name. This is useful for variant unshredding operations, where the metadata column is fixed and – per variant shredding spec – already contains all field names from the typed_value column. It is also useful when projecting a subset of fields from a variant object value, since the bytes can be copied across directly without re-encoding their field ids.
- Short
String - A Variant
ShortString
- Shredding
State - Represents the shredding state of a
VariantArray
- Uuid
- A Universally Unique Identifier (UUID).
- Value
Builder - Wrapper around a
Vec<u8>
that provides methods for appending primitive values, variant types, and metadata. - Variant
Array - An array of Parquet
Variant
values - Variant
Array Builder - A builder for
VariantArray
- Variant
Builder - Top level builder for
Variant
values - Variant
Decimal4 - Represents a 4-byte decimal value in the Variant format.
- Variant
Decimal8 - Represents an 8-byte decimal value in the Variant format.
- Variant
Decimal16 - Represents an 16-byte decimal value in the Variant format.
- Variant
List Variant
Array.- Variant
Metadata Variant
Metadata- Variant
Object - A
Variant
Object (struct with named fields). - Variant
Path - Represents a qualified path to a potential subfield or index of a variant value.
- Variant
Type - Arrow Variant [
ExtensionType
]. - Variant
Value Array Builder - A builder for creating only the value column of a
VariantArray
- Writable
Metadata Builder - Builder for constructing metadata for
Variant
values. - f16
- A 16-bit floating point type implementing the IEEE 754-2008 standard
binary16
a.k.a “half” format.
Enums§
- Variant
- Represents a Parquet Variant
- Variant
Path Element - Element of a
VariantPath
that can be a field name or an index.
Constants§
- EMPTY_
VARIANT_ METADATA - The empty metadata dictionary.
- EMPTY_
VARIANT_ METADATA_ BYTES - The canonical byte slice corresponding to an empty metadata dictionary.
Traits§
- Builder
Specific State - A trait for managing state specific to different builder types.
- Metadata
Builder - A trait for building variant metadata dictionaries, to be used in conjunction with a
ValueBuilder
. The trait provides methods for managing field names and their IDs, as well as rolling back a failed builder operation that might have created new field ids. - Variant
Builder Ext - Extends
VariantBuilder
to help building nestedVariant
s
Functions§
- cast_
to_ variant - Convert an array to a
VariantArray
with strict mode enabled (returns errors on conversion failures). - cast_
to_ variant_ with_ options - Casts a typed arrow [
Array
] to aVariantArray
. This is useful when you need to convert a specific data type - json_
to_ variant - Parse a batch of JSON strings into a batch of Variants represented as STRUCT<metadata: BINARY, value: BINARY> where nulls are preserved. The JSON strings in the input must be valid.
- shred_
variant - Shreds the input binary variant using a target shredding schema derived from the requested data type.
- unshred_
variant - Removes all (nested) typed_value columns from a VariantArray by converting them back to binary variant and merging the resulting values back into the value column.
- variant_
get - Returns an array with the specified path extracted from the variant values.
- variant_
to_ json - Transform a batch of Variant represented as STRUCT<metadata: BINARY, value: BINARY> to a batch of JSON strings where nulls are preserved. The JSON strings in the input must be valid.