Other Data Structures

Our Flatbuffers protocol definition files have metadata for some other data structures defined to allow other kinds of applications to take advantage of common interprocess communication machinery. These data structures are not considered to be part of the columnar format.

An Arrow columnar implementation is not required to implement these types.

Tensor (Multi-dimensional Array)

The Tensor message types provides a way to write a multidimensional array of fixed-size values (such as a NumPy ndarray).

When writing a standalone encapsulated tensor message, we use the encapsulated IPC format defined in the Columnar Specification, but additionally align the starting offset of the tensor body to be a multiple of 64 bytes:

<metadata prefix and metadata>
<PADDING>
<tensor body>

Sparse Tensor

SparseTensor represents a multidimensional array whose elements are generally almost all zeros.

When writing a standalone encapsulated sparse tensor message, we use the encapsulated IPC format defined in the Columnar Specification, but additionally align the starting offsets of the sparse index and the sparse tensor body (if writing to a shared memory region) to be multiples of 64 bytes:

<metadata prefix and metadata>
<PADDING>
<sparse index>
<PADDING>
<sparse tensor body>

The contents of the sparse tensor index depends on what kind of sparse format is used.