pyarrow.ExtensionType#

class pyarrow.ExtensionType(DataType storage_type, extension_name)#

Bases: BaseExtensionType

Concrete base class for Python-defined extension types.

Parameters:
storage_typeDataType

The underlying storage type for the extension type.

extension_namestr

A unique name distinguishing this extension type. The name will be used when deserializing IPC data.

Examples

Define a RationalType extension type subclassing ExtensionType:

>>> import pyarrow as pa
>>> class RationalType(pa.ExtensionType):
...     def __init__(self, data_type: pa.DataType):
...         if not pa.types.is_integer(data_type):
...             raise TypeError(f"data_type must be an integer type not {data_type}")
...         super().__init__(
...             pa.struct(
...                 [
...                     ("numer", data_type),
...                     ("denom", data_type),
...                 ],
...             ),
...             # N.B. This name does _not_ reference `data_type` so deserialization
...             # will work for _any_ integer `data_type` after registration
...             "my_package.rational",
...         )
...     def __arrow_ext_serialize__(self) -> bytes:
...         # No parameters are necessary
...         return b""
...     @classmethod
...     def __arrow_ext_deserialize__(cls, storage_type, serialized):
...         # return an instance of this subclass
...         return RationalType(storage_type[0].type)

Register the extension type:

>>> pa.register_extension_type(RationalType(pa.int64()))

Create an instance of RationalType extension type:

>>> rational_type = RationalType(pa.int32())

Inspect the extension type:

>>> rational_type.extension_name
'my_package.rational'
>>> rational_type.storage_type
StructType(struct<numer: int32, denom: int32>)

Wrap an array as an extension array:

>>> storage_array = pa.array(
...     [
...         {"numer": 10, "denom": 17},
...         {"numer": 20, "denom": 13},
...     ],
...     type=rational_type.storage_type
... )
>>> rational_array = rational_type.wrap_array(storage_array)
>>> rational_array
<pyarrow.lib.ExtensionArray object at ...>
-- is_valid: all not null
-- child 0 type: int32
  [
    10,
    20
  ]
-- child 1 type: int32
  [
    17,
    13
  ]

Or do the same with creating an ExtensionArray:

>>> rational_array = pa.ExtensionArray.from_storage(rational_type, storage_array)
>>> rational_array
<pyarrow.lib.ExtensionArray object at ...>
-- is_valid: all not null
-- child 0 type: int32
  [
    10,
    20
  ]
-- child 1 type: int32
  [
    17,
    13
  ]

Unregister the extension type:

>>> pa.unregister_extension_type("my_package.rational")

Note that even though we registered the concrete type RationalType(pa.int64()), PyArrow will be able to deserialize RationalType(integer_type) for any integer_type, as the deserializer will reference the name my_package.rational and the @classmethod __arrow_ext_deserialize__.

__init__()#

Initialize an extension type instance.

This should be called at the end of the subclass’ __init__ method.

Methods

__init__

Initialize an extension type instance.

equals(self, other, *[, check_metadata])

Return true if type is equivalent to passed value.

field(self, i)

Parameters:

to_pandas_dtype(self)

Return the equivalent NumPy / Pandas dtype.

wrap_array(self, storage)

Wrap the given storage array as an extension array.

Attributes

bit_width

The bit width of the extension type.

byte_width

The byte width of the extension type.

extension_name

The extension type name.

has_variadic_buffers

If True, the number of expected buffers is only lower-bounded by num_buffers.

id

num_buffers

Number of data buffers required to construct Array type excluding children.

num_fields

The number of child fields.

storage_type

The underlying storage type.

bit_width#

The bit width of the extension type.

byte_width#

The byte width of the extension type.

equals(self, other, *, check_metadata=False)#

Return true if type is equivalent to passed value.

Parameters:
otherDataType or str convertible to DataType
check_metadatabool

Whether nested Field metadata equality should be checked as well.

Returns:
is_equalbool

Examples

>>> import pyarrow as pa
>>> pa.int64().equals(pa.string())
False
>>> pa.int64().equals(pa.int64())
True
extension_name#

The extension type name.

field(self, i) Field#
Parameters:
iint
Returns:
pyarrow.Field
has_variadic_buffers#

If True, the number of expected buffers is only lower-bounded by num_buffers.

Examples

>>> import pyarrow as pa
>>> pa.int64().has_variadic_buffers
False
>>> pa.string_view().has_variadic_buffers
True
id#
num_buffers#

Number of data buffers required to construct Array type excluding children.

Examples

>>> import pyarrow as pa
>>> pa.int64().num_buffers
2
>>> pa.string().num_buffers
3
num_fields#

The number of child fields.

Examples

>>> import pyarrow as pa
>>> pa.int64()
DataType(int64)
>>> pa.int64().num_fields
0
>>> pa.list_(pa.string())
ListType(list<item: string>)
>>> pa.list_(pa.string()).num_fields
1
>>> struct = pa.struct({'x': pa.int32(), 'y': pa.string()})
>>> struct.num_fields
2
storage_type#

The underlying storage type.

to_pandas_dtype(self)#

Return the equivalent NumPy / Pandas dtype.

Examples

>>> import pyarrow as pa
>>> pa.int64().to_pandas_dtype()
<class 'numpy.int64'>
wrap_array(self, storage)#

Wrap the given storage array as an extension array.

Parameters:
storageArray or ChunkedArray
Returns:
arrayArray or ChunkedArray

Extension array wrapping the storage array