pyarrow.PyExtensionType

class pyarrow.PyExtensionType(DataType storage_type)

Bases: ExtensionType

Concrete base class for Python-defined extension types based on pickle for (de)serialization.

Parameters:
storage_typeDataType

The storage type for which the extension is built.

Examples

Define a UuidType extension type subclassing PyExtensionType:

>>> import pyarrow as pa
>>> class UuidType(pa.PyExtensionType):
...     def __init__(self):
...         pa.PyExtensionType.__init__(self, pa.binary(16))
...     def __reduce__(self):
...         return UuidType, ()
...

Create an instance of UuidType extension type:

>>> uuid_type = UuidType() 
>>> uuid_type 
UuidType(FixedSizeBinaryType(fixed_size_binary[16]))

Inspect the extension type:

>>> uuid_type.extension_name 
'arrow.py_extension_type'
>>> uuid_type.storage_type 
FixedSizeBinaryType(fixed_size_binary[16])

Wrap an array as an extension array:

>>> import uuid
>>> storage_array = pa.array([uuid.uuid4().bytes for _ in range(4)],
...                          pa.binary(16)) 
>>> uuid_type.wrap_array(storage_array) 
<pyarrow.lib.ExtensionArray object at ...>
[
  ...
]

Or do the same with creating an ExtensionArray:

>>> pa.ExtensionArray.from_storage(uuid_type,
...                                storage_array) 
<pyarrow.lib.ExtensionArray object at ...>
[
  ...
]
__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

equals(self, other, *[, check_metadata])

Return true if type is equivalent to passed value.

field(self, i)

to_pandas_dtype(self)

Return the equivalent NumPy / Pandas dtype.

wrap_array(self, storage)

Wrap the given storage array as an extension array.

Attributes

bit_width

Bit width for fixed width type.

extension_name

The extension type name.

id

num_buffers

Number of data buffers required to construct Array type excluding children.

num_fields

The number of child fields.

storage_type

The underlying storage type.

bit_width

Bit width for fixed width type.

Examples

>>> import pyarrow as pa
>>> pa.int64()
DataType(int64)
>>> pa.int64().bit_width
64
equals(self, other, *, check_metadata=False)

Return true if type is equivalent to passed value.

Parameters:
otherDataType or str convertible to DataType
check_metadatabool

Whether nested Field metadata equality should be checked as well.

Returns:
is_equalbool

Examples

>>> import pyarrow as pa
>>> pa.int64().equals(pa.string())
False
>>> pa.int64().equals(pa.int64())
True
extension_name

The extension type name.

field(self, i) Field
id
num_buffers

Number of data buffers required to construct Array type excluding children.

Examples

>>> import pyarrow as pa
>>> pa.int64().num_buffers
2
>>> pa.string().num_buffers
3
num_fields

The number of child fields.

Examples

>>> import pyarrow as pa
>>> pa.int64()
DataType(int64)
>>> pa.int64().num_fields
0
>>> pa.list_(pa.string())
ListType(list<item: string>)
>>> pa.list_(pa.string()).num_fields
1
>>> struct = pa.struct({'x': pa.int32(), 'y': pa.string()})
>>> struct.num_fields
2
storage_type

The underlying storage type.

to_pandas_dtype(self)

Return the equivalent NumPy / Pandas dtype.

Examples

>>> import pyarrow as pa
>>> pa.int64().to_pandas_dtype()
<class 'numpy.int64'>
wrap_array(self, storage)

Wrap the given storage array as an extension array.

Parameters:
storageArray or ChunkedArray
Returns:
arrayArray or ChunkedArray

Extension array wrapping the storage array