Low-level Helpers#

C Schema Utilities#

Arrow and nanoarrow C structure wrappers

These classes and their constructors wrap Arrow C Data/Stream interface structures (i.e., ArrowArray, ArrowSchema, and ArrowArrayStream) and the nanoarrow C library structures that help deserialize their content (i.e., the ArrowSchemaView and ArrowArrayView). These wrappers are currently implemented in Cython and their scope is limited to lifecycle management and member access as Python objects.

allocate_c_schema() CSchema#

Allocate an uninitialized ArrowSchema wrapper

Examples#

>>> import pyarrow as pa
>>> from nanoarrow.c_schema import allocate_c_schema
>>> schema = allocate_c_schema()
>>> pa.int32()._export_to_c(schema._addr())
c_schema(obj=None) CSchema#

ArrowSchema wrapper

The CSchema class provides a Python-friendly interface to access the fields of an ArrowSchema as defined in the Arrow C Data interface. These objects are created using nanoarrow.c_schema(), which accepts any schema or data type-like object according to the Arrow PyCapsule interface.

This Python wrapper allows access to schema struct members but does not automatically deserialize their content: use c_schema_view() to validate and deserialize the content into a more easily inspectable object.

Note that the CSchema objects returned by .child() hold strong references to the original ArrowSchema to avoid copies while inspecting an imported structure.

Examples#

>>> import pyarrow as pa
>>> import nanoarrow as na
>>> schema = na.c_schema(pa.int32())
>>> schema.is_valid()
True
>>> schema.format
'i'
>>> schema.name
''
c_schema_view(obj) CSchemaView#

ArrowSchemaView wrapper

The ArrowSchemaView is a nanoarrow C library structure that facilitates access to the deserialized content of an ArrowSchema (e.g., parameter values for parameterized types). This wrapper extends that facility to Python.

Examples#

>>> import pyarrow as pa
>>> import nanoarrow as na
>>> from nanoarrow.c_schema import c_schema_view
>>> schema = na.c_schema(pa.decimal128(10, 3))
>>> schema_view = c_schema_view(schema)
>>> schema_view.type
'decimal128'
>>> schema_view.decimal_bitwidth
128
>>> schema_view.decimal_precision
10
>>> schema_view.decimal_scale
3

C Array Utilities#

class ArrayBuilder(schema)#

Internal utility to build CArrays from various types of input

This class and its subclasses are designed to help separate the code that actually builds a CArray from the code that chooses the strategy used to do the building.

classmethod infer_schema(obj) Tuple[CSchema, Any]#

Infer the Arrow data type from a target object

Returns the type as a CSchema and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).

class ArrayFromIterableBuilder(schema)#

Build a CArray from an iterable of scalar objects

This builder converts an iterable to a CArray using some heuristics to pick the fastest available method for converting to a particular type of array. Briefly, the methods are (1) ArrowArrayAppendXXX() functions from the C library (string, binary), (2) array.array() (integer/float except float16), (3) CBufferBuilder.write_elements() (everything else).

classmethod infer_schema(obj) Tuple[CBuffer, CSchema]#

Infer the Arrow data type from a target object

Returns the type as a CSchema and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).

class ArrayFromPyBufferBuilder(schema)#

Build a CArray from a Python Buffer

This builder converts a Python buffer (e.g., numpy array, bytes, array.array) to a CArray (without copying the contents of the buffer).

classmethod infer_schema(obj) Tuple[CBuffer, CSchema]#

Infer the Arrow data type from a target object

Returns the type as a CSchema and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).

class EmptyArrayBuilder(schema)#

Build an empty CArray of any type

This builder accepts any empty input and produces a valid length zero array as output.

classmethod infer_schema(obj) Tuple[Any, CSchema]#

Infer the Arrow data type from a target object

Returns the type as a CSchema and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).

allocate_c_array(schema=None) CArray#

Allocate an uninitialized ArrowArray

Examples#

>>> import pyarrow as pa
>>> from nanoarrow.c_array import allocate_c_array
>>> array = allocate_c_array()
>>> pa.array([1, 2, 3])._export_to_c(array._addr())
c_array(obj, schema=None) CArray#

ArrowArray wrapper

This class provides a user-facing interface to access the fields of an ArrowArray as defined in the Arrow C Data interface, holding an optional reference to a CSchema that can be used to safely deserialize the content.

These objects are created using c_array(), which accepts any array-like object according to the Arrow PyCapsule interface, Python buffer protocol, or iterable of Python objects.

This Python wrapper allows access to array fields but does not automatically deserialize their content: use c_array_view() to validate and deserialize the content into a more easily inspectable object.

Note that the CArray objects returned by .child() hold strong references to the original ArrowArray to avoid copies while inspecting an imported structure.

Parameters#

objarray-like

An object supporting the Arrow PyCapsule interface, the Python buffer protocol, or an iterable of Python objects.

schemaschema-like or None

A schema-like object as sanitized by c_schema() or None. This value will be used to request a data type from obj; however, the conversion is best-effort (i.e., the data type of the returned CArray may be different than schema).

Examples#

>>> import nanoarrow as na
>>> # Create from iterable
>>> array = na.c_array([1, 2, 3], na.int32())
>>> # Create from Python buffer (e.g., numpy array)
>>> import numpy as np
>>> array = na.c_array(np.array([1, 2, 3]))
>>> # Create from Arrow PyCapsule (e.g., pyarrow array)
>>> import pyarrow as pa
>>> array = na.c_array(pa.array([1, 2, 3]))
>>> # Access array fields
>>> array.length
3
>>> array.null_count
0
c_array_from_buffers(schema, length: int, buffers: Iterable[Any], null_count: int = -1, offset: int = 0, children: Iterable[Any] = (), validation_level: Literal[None, 'full', 'default', 'minimal', 'none'] = None, move: bool = False, device: Device | None = None) CArray#

Create an ArrowArray wrapper from components

Given a schema, build an ArrowArray buffer-wise. This allows almost any array to be assembled; however, requires some knowledge of the Arrow Columnar specification. This function will do its best to validate the sizes and content of buffers according to validation_level; however, not all types of arrays can currently be validated when constructed in this way.

Parameters#

schemaschema-like

The data type of the desired array as sanitized by c_schema().

lengthint

The length of the output array.

buffersIterable of buffer-like or None

An iterable of buffers as sanitized by c_buffer(). Any object supporting the Python Buffer protocol is accepted. Buffer data types are not checked. A buffer value of None will skip setting a buffer (i.e., that buffer will be of length zero and its pointer will be NULL).

null_countint, optional

The number of null values, if known in advance. If -1 (the default), the null count will be calculated based on the validity bitmap. If the validity bitmap was set to None, the calculated null count will be zero.

offsetint, optional

The logical offset from the start of the array.

childrenIterable of array-like

An iterable of arrays used to set child fields of the array. Can contain any object accepted by c_array(). Must contain the exact number of required children as specifed by schema.

validation_level: None or str, optional

One of “none” (no check), “minimal” (check buffer sizes that do not require dereferencing buffer content), “default” (check all buffer sizes), or “full” (check all buffer sizes and all buffer content). The default, None, will validate at the “default” level where possible.

movebool, optional

Use True to move ownership of any input buffers or children to the output array.

deviceDevice, optional

An explicit device to use when constructing this array. If specified, this function will construct a CDeviceArray; if unspecified, this function will construct a CArray on the CPU device.

Examples#

>>> import nanoarrow as na
>>> c_array = na.c_array_from_buffers(na.uint8(), 5, [None, b"12345"])
>>> na.Array(c_array).inspect()
<ArrowArray uint8>
- length: 5
- offset: 0
- null_count: 0
- buffers[2]:
  - validity <bool[0 b] >
  - data <uint8[5 b] 49 50 51 52 53>
- dictionary: NULL
- children[0]:
c_array_view(obj, schema=None) CArrayView#

ArrowArrayView wrapper

The ArrowArrayView is a nanoarrow C library structure that provides structured access to buffers addresses, buffer sizes, and buffer data types. The buffer data is usually propagated from an ArrowArray but can also be propagated from other types of objects (e.g., serialized IPC). The offset and length of this view are independent of its parent (i.e., this object can also represent a slice of its parent).

Examples#

>>> import pyarrow as pa
>>> import numpy as np
>>> import nanoarrow as na
>>> from nanoarrow.c_array import c_array_view
>>>
>>> array = na.c_array(pa.array(["one", "two", "three", None]))
>>> array_view = c_array_view(array)
>>> np.array(array_view.buffer(1))
array([ 0,  3,  6, 11, 11], dtype=int32)
>>> np.array(array_view.buffer(2))
array([b'o', b'n', b'e', b't', b'w', b'o', b't', b'h', b'r', b'e', b'e'],
      dtype='|S1')

C ArrayStream Utilities#

allocate_c_array_stream() CArrayStream#

Allocate an uninitialized ArrowArrayStream wrapper

Examples#

>>> import pyarrow as pa
>>> from nanoarrow.c_array_stream import allocate_c_array_stream
>>> pa_column = pa.array([1, 2, 3], pa.int32())
>>> pa_batch = pa.record_batch([pa_column], names=["col1"])
>>> pa_reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch])
>>> array_stream = allocate_c_array_stream()
>>> pa_reader._export_to_c(array_stream._addr())
c_array_stream(obj=None, schema=None) CArrayStream#

ArrowArrayStream wrapper

This class provides a user-facing interface to access the fields of an ArrowArrayStream as defined in the Arrow C Stream interface. These objects are usually created using nanoarrow.c_array_stream().

Examples#

>>> import pyarrow as pa
>>> import nanoarrow as na
>>> pa_column = pa.array([1, 2, 3], pa.int32())
>>> pa_batch = pa.record_batch([pa_column], names=["col1"])
>>> pa_reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch])
>>> array_stream = na.c_array_stream(pa_reader)
>>> array_stream.get_schema()
<nanoarrow.c_schema.CSchema struct>
- format: '+s'
- name: ''
- flags: 0
- metadata: NULL
- dictionary: NULL
- children[1]:
  'col1': <nanoarrow.c_schema.CSchema int32>
    - format: 'i'
    - name: 'col1'
    - flags: 2
    - metadata: NULL
    - dictionary: NULL
    - children[0]:
>>> array_stream.get_next().length
3
>>> array_stream.get_next() is None
Traceback (most recent call last):
  ...
StopIteration