The DLPack Protocol#

The DLPack Protocol is a stable in-memory data structure that allows exchange between major frameworks working with multidimensional arrays or tensors. It is designed for cross hardware support meaning it allows exchange of data on devices other than the CPU (e.g. GPU).

DLPack protocol had been selected as the Python array API standard by the Consortium for Python Data API Standards in order to enable device aware data interchange between array/tensor libraries in the Python ecosystem. See more about the standard in the protocol documentation and more about DLPack in the Python Specification for DLPack.

Implementation of DLPack in PyArrow#

The producing side of the DLPack Protocol is implemented for pa.Array and can be used to interchange data between PyArrow and other tensor libraries. Supported data types are integer, unsigned integer and float. The protocol has no missing data support meaning PyArrow arrays with missing values cannot be transferred through the DLPack protocol. Currently, the Arrow implementation of the protocol only supports data on a CPU device.

Data interchange syntax of the protocol includes

  1. from_dlpack(x): consuming an array object that implements a __dlpack__ method and creating a new array while sharing the memory.

  2. __dlpack__(self, stream=None) and __dlpack_device__: producing a PyCapsule with the DLPack struct which is called from within from_dlpack(x).

PyArrow implements the second part of the protocol (__dlpack__(self, stream=None) and __dlpack_device__) and can thus be consumed by libraries implementing from_dlpack.

Example#

Convert a PyArrow CPU array to NumPy array:

>>> import pyarrow as pa
>>> array = pa.array([2, 0, 2, 4])
<pyarrow.lib.Int64Array object at 0x121fd4880>
[
2,
0,
2,
4
]

>>> import numpy as np
>>> np.from_dlpack(array)
array([2, 0, 2, 4])

Convert a PyArrow CPU array to PyTorch tensor:

>>> import torch
>>> torch.from_dlpack(array)
tensor([2, 0, 2, 4])