The DLPack Protocol#
The DLPack Protocol is a stable in-memory data structure that allows exchange between major frameworks working with multidimensional arrays or tensors. It is designed for cross hardware support meaning it allows exchange of data on devices other than the CPU (e.g. GPU).
DLPack protocol had been selected as the Python array API standard by the Consortium for Python Data API Standards in order to enable device aware data interchange between array/tensor libraries in the Python ecosystem. See more about the standard in the protocol documentation and more about DLPack in the Python Specification for DLPack.
Implementation of DLPack in PyArrow#
The producing side of the DLPack Protocol is implemented for pa.Array
and can be used to interchange data between PyArrow and other tensor
libraries. Supported data types are integer, unsigned integer and float. The
protocol has no missing data support meaning PyArrow arrays with
missing values cannot be transferred through the DLPack
protocol. Currently, the Arrow implementation of the protocol only supports
data on a CPU device.
Data interchange syntax of the protocol includes
from_dlpack(x)
: consuming an array object that implements a__dlpack__
method and creating a new array while sharing the memory.__dlpack__(self, stream=None)
and__dlpack_device__
: producing a PyCapsule with the DLPack struct which is called from withinfrom_dlpack(x)
.
PyArrow implements the second part of the protocol
(__dlpack__(self, stream=None)
and __dlpack_device__
) and can
thus be consumed by libraries implementing from_dlpack
.
Example#
Convert a PyArrow CPU array to NumPy array:
>>> import pyarrow as pa
>>> array = pa.array([2, 0, 2, 4])
<pyarrow.lib.Int64Array object at 0x121fd4880>
[
2,
0,
2,
4
]
>>> import numpy as np
>>> np.from_dlpack(array)
array([2, 0, 2, 4])
Convert a PyArrow CPU array to PyTorch tensor:
>>> import torch
>>> torch.from_dlpack(array)
tensor([2, 0, 2, 4])