Trait ExtensionType

Source
pub trait ExtensionType: Sized {
    type Metadata;

    const NAME: &'static str;

    // Required methods
    fn metadata(&self) -> &Self::Metadata;
    fn serialize_metadata(&self) -> Option<String>;
    fn deserialize_metadata(
        metadata: Option<&str>,
    ) -> Result<Self::Metadata, ArrowError>;
    fn supports_data_type(&self, data_type: &DataType) -> Result<(), ArrowError>;
    fn try_new(
        data_type: &DataType,
        metadata: Self::Metadata,
    ) -> Result<Self, ArrowError>;
}
Expand description

Extension types.

User-defined “extension” types can be defined setting certain key value pairs in the Field metadata structure. These extension keys are:

Canonical extension types support in this crate requires the canonical_extension_types feature.

Extension types may or may not use the EXTENSION_TYPE_METADATA_KEY field.

§Example

The example below demonstrates how to implement this trait for a Uuid type. Note this is not the canonical extension type for Uuid, which does not include information about the Uuid version.

use arrow_schema::{DataType, extension::ExtensionType, Field};
use std::{fmt, str::FromStr};

/// The different Uuid versions.
#[derive(Clone, Copy, Debug, PartialEq)]
enum UuidVersion {
    V1,
    V2,
    V3,
    V4,
    V5,
    V6,
    V7,
    V8,
}

// We'll use `Display` to serialize.
impl fmt::Display for UuidVersion {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(
            f,
            "{}",
            match self {
                Self::V1 => "V1",
                Self::V2 => "V2",
                Self::V3 => "V3",
                Self::V4 => "V4",
                Self::V5 => "V5",
                Self::V6 => "V6",
                Self::V7 => "V7",
                Self::V8 => "V8",
            }
        )
    }
}

// And `FromStr` to deserialize.
impl FromStr for UuidVersion {
    type Err = ArrowError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        match s {
            "V1" => Ok(Self::V1),
            "V2" => Ok(Self::V2),
            "V3" => Ok(Self::V3),
            "V4" => Ok(Self::V4),
            "V5" => Ok(Self::V5),
            "V6" => Ok(Self::V6),
            "V7" => Ok(Self::V7),
            "V8" => Ok(Self::V8),
            _ => Err(ArrowError::ParseError("Invalid UuidVersion".to_owned())),
        }
    }
}

/// This is the extension type, not the container for Uuid values. It
/// stores the Uuid version (this is the metadata of this extension type).
#[derive(Clone, Copy, Debug, PartialEq)]
struct Uuid(UuidVersion);

impl ExtensionType for Uuid {
    // We use a namespace as suggested by the specification.
    const NAME: &'static str = "myorg.example.uuid";

    // The metadata type is the Uuid version.
    type Metadata = UuidVersion;

    // We just return a reference to the Uuid version.
    fn metadata(&self) -> &Self::Metadata {
        &self.0
    }

    // We use the `Display` implementation to serialize the Uuid
    // version.
    fn serialize_metadata(&self) -> Option<String> {
        Some(self.0.to_string())
    }

    // We use the `FromStr` implementation to deserialize the Uuid
    // version.
    fn deserialize_metadata(metadata: Option<&str>) -> Result<Self::Metadata, ArrowError> {
        metadata.map_or_else(
            || {
                Err(ArrowError::InvalidArgumentError(
                    "Uuid extension type metadata missing".to_owned(),
                ))
            },
            str::parse,
        )
    }

    // The only supported data type is `FixedSizeBinary(16)`.
    fn supports_data_type(&self, data_type: &DataType) -> Result<(), ArrowError> {
        match data_type {
            DataType::FixedSizeBinary(16) => Ok(()),
            data_type => Err(ArrowError::InvalidArgumentError(format!(
                "Uuid data type mismatch, expected FixedSizeBinary(16), found {data_type}"
            ))),
        }
    }

    // We should always check if the data type is supported before
    // constructing the extension type.
    fn try_new(data_type: &DataType, metadata: Self::Metadata) -> Result<Self, ArrowError> {
        let uuid = Self(metadata);
        uuid.supports_data_type(data_type)?;
        Ok(uuid)
    }
}

// We can now construct the extension type.
let uuid_v1 = Uuid(UuidVersion::V1);

// And add it to a field.
let mut field =
    Field::new("", DataType::FixedSizeBinary(16), false).with_extension_type(uuid_v1);

// And extract it from this field.
assert_eq!(field.try_extension_type::<Uuid>()?, uuid_v1);

// When we try to add this to a field with an unsupported data type we
// get an error.
let result = Field::new("", DataType::Null, false).try_with_extension_type(uuid_v1);
assert!(result.is_err());

https://arrow.apache.org/docs/format/Columnar.html#extension-types

Required Associated Constants§

Source

const NAME: &'static str

The name identifying this extension type.

This is the string value that is used for the EXTENSION_TYPE_NAME_KEY in the Field::metadata of a Field to identify this extension type.

We recommend that you use a “namespace”-style prefix for extension type names to minimize the possibility of conflicts with multiple Arrow readers and writers in the same application. For example, use myorg.name_of_type instead of simply name_of_type.

Extension names beginning with arrow. are reserved for canonical extension types, they should not be used for third-party extension types.

Extension names are case-sensitive.

Required Associated Types§

Source

type Metadata

The metadata type of this extension type.

Implementations can use strongly or loosly typed data structures here depending on the complexity of the metadata.

Implementations can also use Self here if the extension type can be constructed directly from its metadata.

If an extension type defines no metadata it should use () to indicate this.

Required Methods§

Source

fn metadata(&self) -> &Self::Metadata

Returns a reference to the metadata of this extension type, or &() if if this extension type defines no metadata (Self::Metadata=()).

Source

fn serialize_metadata(&self) -> Option<String>

Returns the serialized representation of the metadata of this extension type, or None if this extension type defines no metadata (Self::Metadata=()).

This is string value that is used for the EXTENSION_TYPE_METADATA_KEY in the Field::metadata of a Field.

Source

fn deserialize_metadata( metadata: Option<&str>, ) -> Result<Self::Metadata, ArrowError>

Deserialize the metadata of this extension type from the serialized representation of the metadata. An extension type that defines no metadata should expect None for the serialized metadata and return Ok(()).

This function should return an error when

  • expected metadata is missing (for extensions types with non-optional metadata)
  • unexpected metadata is set (for extension types without metadata)
  • deserialization of metadata fails
Source

fn supports_data_type(&self, data_type: &DataType) -> Result<(), ArrowError>

Returns OK()) iff the given data type is supported by this extension type.

Source

fn try_new( data_type: &DataType, metadata: Self::Metadata, ) -> Result<Self, ArrowError>

Construct this extension type for a field with the given data type and metadata.

This should return an error if the given data type is not supported by this extension type.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§

Source§

impl ExtensionType for Bool8

Source§

const NAME: &'static str = "arrow.bool8"

Source§

type Metadata = &'static str

Source§

impl ExtensionType for FixedShapeTensor

Source§

const NAME: &'static str = "arrow.fixed_shape_tensor"

Source§

type Metadata = FixedShapeTensorMetadata

Source§

impl ExtensionType for Json

Source§

const NAME: &'static str = "arrow.json"

Source§

type Metadata = JsonMetadata

Source§

impl ExtensionType for Opaque

Source§

const NAME: &'static str = "arrow.opaque"

Source§

type Metadata = OpaqueMetadata

Source§

impl ExtensionType for Uuid

Source§

const NAME: &'static str = "arrow.uuid"

Source§

type Metadata = ()

Source§

impl ExtensionType for VariableShapeTensor

Source§

const NAME: &'static str = "arrow.variable_shape_tensor"

Source§

type Metadata = VariableShapeTensorMetadata