pyarrow.ExtensionType#
- class pyarrow.ExtensionType(DataType storage_type, extension_name)#
Bases:
BaseExtensionType
Concrete base class for Python-defined extension types.
- Parameters:
Examples
Define a RationalType extension type subclassing ExtensionType:
>>> import pyarrow as pa >>> class RationalType(pa.ExtensionType): ... def __init__(self, data_type: pa.DataType): ... if not pa.types.is_integer(data_type): ... raise TypeError(f"data_type must be an integer type not {data_type}") ... super().__init__( ... pa.struct( ... [ ... ("numer", data_type), ... ("denom", data_type), ... ], ... ), ... # N.B. This name does _not_ reference `data_type` so deserialization ... # will work for _any_ integer `data_type` after registration ... "my_package.rational", ... ) ... def __arrow_ext_serialize__(self) -> bytes: ... # No parameters are necessary ... return b"" ... @classmethod ... def __arrow_ext_deserialize__(cls, storage_type, serialized): ... # return an instance of this subclass ... return RationalType(storage_type[0].type)
Register the extension type:
>>> pa.register_extension_type(RationalType(pa.int64()))
Create an instance of RationalType extension type:
>>> rational_type = RationalType(pa.int32())
Inspect the extension type:
>>> rational_type.extension_name 'my_package.rational' >>> rational_type.storage_type StructType(struct<numer: int32, denom: int32>)
Wrap an array as an extension array:
>>> storage_array = pa.array( ... [ ... {"numer": 10, "denom": 17}, ... {"numer": 20, "denom": 13}, ... ], ... type=rational_type.storage_type ... ) >>> rational_array = rational_type.wrap_array(storage_array) >>> rational_array <pyarrow.lib.ExtensionArray object at ...> -- is_valid: all not null -- child 0 type: int32 [ 10, 20 ] -- child 1 type: int32 [ 17, 13 ]
Or do the same with creating an ExtensionArray:
>>> rational_array = pa.ExtensionArray.from_storage(rational_type, storage_array) >>> rational_array <pyarrow.lib.ExtensionArray object at ...> -- is_valid: all not null -- child 0 type: int32 [ 10, 20 ] -- child 1 type: int32 [ 17, 13 ]
Unregister the extension type:
>>> pa.unregister_extension_type("my_package.rational")
Note that even though we registered the concrete type
RationalType(pa.int64())
, PyArrow will be able to deserializeRationalType(integer_type)
for anyinteger_type
, as the deserializer will reference the namemy_package.rational
and the@classmethod
__arrow_ext_deserialize__
.- __init__()#
Initialize an extension type instance.
This should be called at the end of the subclass’
__init__
method.
Methods
Initialize an extension type instance.
equals
(self, other, *[, check_metadata])Return true if type is equivalent to passed value.
field
(self, i)- Parameters:
to_pandas_dtype
(self)Return the equivalent NumPy / Pandas dtype.
wrap_array
(self, storage)Wrap the given storage array as an extension array.
Attributes
The bit width of the extension type.
The byte width of the extension type.
The extension type name.
If True, the number of expected buffers is only lower-bounded by num_buffers.
Number of data buffers required to construct Array type excluding children.
The number of child fields.
The underlying storage type.
- bit_width#
The bit width of the extension type.
- byte_width#
The byte width of the extension type.
- equals(self, other, *, check_metadata=False)#
Return true if type is equivalent to passed value.
- Parameters:
- Returns:
- is_equalbool
Examples
>>> import pyarrow as pa >>> pa.int64().equals(pa.string()) False >>> pa.int64().equals(pa.int64()) True
- extension_name#
The extension type name.
- has_variadic_buffers#
If True, the number of expected buffers is only lower-bounded by num_buffers.
Examples
>>> import pyarrow as pa >>> pa.int64().has_variadic_buffers False >>> pa.string_view().has_variadic_buffers True
- id#
- num_buffers#
Number of data buffers required to construct Array type excluding children.
Examples
>>> import pyarrow as pa >>> pa.int64().num_buffers 2 >>> pa.string().num_buffers 3
- num_fields#
The number of child fields.
Examples
>>> import pyarrow as pa >>> pa.int64() DataType(int64) >>> pa.int64().num_fields 0 >>> pa.list_(pa.string()) ListType(list<item: string>) >>> pa.list_(pa.string()).num_fields 1 >>> struct = pa.struct({'x': pa.int32(), 'y': pa.string()}) >>> struct.num_fields 2
- storage_type#
The underlying storage type.
- to_pandas_dtype(self)#
Return the equivalent NumPy / Pandas dtype.
Examples
>>> import pyarrow as pa >>> pa.int64().to_pandas_dtype() <class 'numpy.int64'>
- wrap_array(self, storage)#
Wrap the given storage array as an extension array.
- Parameters:
- storage
Array
orChunkedArray
- storage
- Returns:
- array
Array
orChunkedArray
Extension array wrapping the storage array
- array