pyarrow.ExtensionType¶
- class pyarrow.ExtensionType(DataType storage_type, extension_name)¶
Bases:
BaseExtensionType
Concrete base class for Python-defined extension types.
Examples
Define a UuidType extension type subclassing ExtensionType:
>>> import pyarrow as pa >>> class UuidType(pa.ExtensionType): ... def __init__(self): ... pa.ExtensionType.__init__(self, pa.binary(16), "my_package.uuid") ... def __arrow_ext_serialize__(self): ... # since we don't have a parameterized type, we don't need extra ... # metadata to be deserialized ... return b'' ... @classmethod ... def __arrow_ext_deserialize__(self, storage_type, serialized): ... # return an instance of this subclass given the serialized ... # metadata. ... return UuidType() ...
Register the extension type:
>>> pa.register_extension_type(UuidType())
Create an instance of UuidType extension type:
>>> uuid_type = UuidType()
Inspect the extension type:
>>> uuid_type.extension_name 'my_package.uuid' >>> uuid_type.storage_type FixedSizeBinaryType(fixed_size_binary[16])
Wrap an array as an extension array:
>>> import uuid >>> storage_array = pa.array([uuid.uuid4().bytes for _ in range(4)], pa.binary(16)) >>> uuid_type.wrap_array(storage_array) <pyarrow.lib.ExtensionArray object at ...> [ ... ]
Or do the same with creating an ExtensionArray:
>>> pa.ExtensionArray.from_storage(uuid_type, storage_array) <pyarrow.lib.ExtensionArray object at ...> [ ... ]
Unregister the extension type:
>>> pa.unregister_extension_type("my_package.uuid")
- __init__()¶
Initialize an extension type instance.
This should be called at the end of the subclass’
__init__
method.
Methods
Initialize an extension type instance.
equals
(self, other, *[, check_metadata])Return true if type is equivalent to passed value.
field
(self, i)to_pandas_dtype
(self)Return the equivalent NumPy / Pandas dtype.
wrap_array
(self, storage)Wrap the given storage array as an extension array.
Attributes
Bit width for fixed width type.
The extension type name.
Number of data buffers required to construct Array type excluding children.
The number of child fields.
The underlying storage type.
- bit_width¶
Bit width for fixed width type.
Examples
>>> import pyarrow as pa >>> pa.int64() DataType(int64) >>> pa.int64().bit_width 64
- equals(self, other, *, check_metadata=False)¶
Return true if type is equivalent to passed value.
- Parameters:
- Returns:
- is_equalbool
Examples
>>> import pyarrow as pa >>> pa.int64().equals(pa.string()) False >>> pa.int64().equals(pa.int64()) True
- extension_name¶
The extension type name.
- id¶
- num_buffers¶
Number of data buffers required to construct Array type excluding children.
Examples
>>> import pyarrow as pa >>> pa.int64().num_buffers 2 >>> pa.string().num_buffers 3
- num_fields¶
The number of child fields.
Examples
>>> import pyarrow as pa >>> pa.int64() DataType(int64) >>> pa.int64().num_fields 0 >>> pa.list_(pa.string()) ListType(list<item: string>) >>> pa.list_(pa.string()).num_fields 1 >>> struct = pa.struct({'x': pa.int32(), 'y': pa.string()}) >>> struct.num_fields 2
- storage_type¶
The underlying storage type.
- to_pandas_dtype(self)¶
Return the equivalent NumPy / Pandas dtype.
Examples
>>> import pyarrow as pa >>> pa.int64().to_pandas_dtype() <class 'numpy.int64'>
- wrap_array(self, storage)¶
Wrap the given storage array as an extension array.
- Parameters:
- storage
Array
orChunkedArray
- storage
- Returns:
- array
Array
orChunkedArray
Extension array wrapping the storage array
- array