Low-level Helpers#
C Schema Utilities#
Arrow and nanoarrow C structure wrappers
These classes and their constructors wrap Arrow C Data/Stream interface structures
(i.e., ArrowArray
, ArrowSchema
, and ArrowArrayStream
) and the
nanoarrow C library structures that help deserialize their content (i.e., the
ArrowSchemaView
and ArrowArrayView
). These wrappers are currently implemented
in Cython and their scope is limited to lifecycle management and member access as
Python objects.
- allocate_c_schema() CSchema #
Allocate an uninitialized ArrowSchema wrapper
Examples#
>>> import pyarrow as pa >>> from nanoarrow.c_schema import allocate_c_schema >>> schema = allocate_c_schema() >>> pa.int32()._export_to_c(schema._addr())
- c_schema(obj=None) CSchema #
ArrowSchema wrapper
The
CSchema
class provides a Python-friendly interface to access the fields of anArrowSchema
as defined in the Arrow C Data interface. These objects are created using nanoarrow.c_schema(), which accepts any schema or data type-like object according to the Arrow PyCapsule interface.This Python wrapper allows access to schema struct members but does not automatically deserialize their content: use
c_schema_view()
to validate and deserialize the content into a more easily inspectable object.Note that the
CSchema
objects returned by.child()
hold strong references to the original ArrowSchema to avoid copies while inspecting an imported structure.Examples#
>>> import pyarrow as pa >>> import nanoarrow as na >>> schema = na.c_schema(pa.int32()) >>> schema.is_valid() True >>> schema.format 'i' >>> schema.name ''
- c_schema_view(obj) CSchemaView #
ArrowSchemaView wrapper
The
ArrowSchemaView
is a nanoarrow C library structure that facilitates access to the deserialized content of anArrowSchema
(e.g., parameter values for parameterized types). This wrapper extends that facility to Python.Examples#
>>> import pyarrow as pa >>> import nanoarrow as na >>> from nanoarrow.c_schema import c_schema_view >>> schema = na.c_schema(pa.decimal128(10, 3)) >>> schema_view = c_schema_view(schema) >>> schema_view.type 'decimal128' >>> schema_view.decimal_bitwidth 128 >>> schema_view.decimal_precision 10 >>> schema_view.decimal_scale 3
C Array Utilities#
- class ArrayBuilder(schema)#
Internal utility to build CArrays from various types of input
This class and its subclasses are designed to help separate the code that actually builds a CArray from the code that chooses the strategy used to do the building.
- classmethod infer_schema(obj) Tuple[CSchema, Any] #
Infer the Arrow data type from a target object
Returns the type as a
CSchema
and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).
- class ArrayFromIterableBuilder(schema)#
Build a CArray from an iterable of scalar objects
This builder converts an iterable to a CArray using some heuristics to pick the fastest available method for converting to a particular type of array. Briefly, the methods are (1) ArrowArrayAppendXXX() functions from the C library (string, binary), (2) array.array() (integer/float except float16), (3) CBufferBuilder.write_elements() (everything else).
- classmethod infer_schema(obj) Tuple[CBuffer, CSchema] #
Infer the Arrow data type from a target object
Returns the type as a
CSchema
and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).
- class ArrayFromPyBufferBuilder(schema)#
Build a CArray from a Python Buffer
This builder converts a Python buffer (e.g., numpy array, bytes, array.array) to a CArray (without copying the contents of the buffer).
- classmethod infer_schema(obj) Tuple[CBuffer, CSchema] #
Infer the Arrow data type from a target object
Returns the type as a
CSchema
and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).
- class EmptyArrayBuilder(schema)#
Build an empty CArray of any type
This builder accepts any empty input and produces a valid length zero array as output.
- classmethod infer_schema(obj) Tuple[Any, CSchema] #
Infer the Arrow data type from a target object
Returns the type as a
CSchema
and an object that can be consumed in the same way by append() in the event it had to be modified to infer its type (e.g., for an iterable, it would be necessary to consume the first element from the original iterator).
- allocate_c_array(schema=None) CArray #
Allocate an uninitialized ArrowArray
Examples#
>>> import pyarrow as pa >>> from nanoarrow.c_array import allocate_c_array >>> array = allocate_c_array() >>> pa.array([1, 2, 3])._export_to_c(array._addr())
- c_array(obj, schema=None) CArray #
ArrowArray wrapper
This class provides a user-facing interface to access the fields of an ArrowArray as defined in the Arrow C Data interface, holding an optional reference to a
CSchema
that can be used to safely deserialize the content.These objects are created using
c_array()
, which accepts any array-like object according to the Arrow PyCapsule interface, Python buffer protocol, or iterable of Python objects.This Python wrapper allows access to array fields but does not automatically deserialize their content: use
c_array_view()
to validate and deserialize the content into a more easily inspectable object.Note that the
CArray
objects returned by.child()
hold strong references to the originalArrowArray
to avoid copies while inspecting an imported structure.Parameters#
- objarray-like
An object supporting the Arrow PyCapsule interface, the Python buffer protocol, or an iterable of Python objects.
- schemaschema-like or None
A schema-like object as sanitized by
c_schema()
or None. This value will be used to request a data type fromobj
; however, the conversion is best-effort (i.e., the data type of the returnedCArray
may be different thanschema
).
Examples#
>>> import nanoarrow as na >>> # Create from iterable >>> array = na.c_array([1, 2, 3], na.int32()) >>> # Create from Python buffer (e.g., numpy array) >>> import numpy as np >>> array = na.c_array(np.array([1, 2, 3])) >>> # Create from Arrow PyCapsule (e.g., pyarrow array) >>> import pyarrow as pa >>> array = na.c_array(pa.array([1, 2, 3])) >>> # Access array fields >>> array.length 3 >>> array.null_count 0
- c_array_from_buffers(schema, length: int, buffers: Iterable[Any], null_count: int = -1, offset: int = 0, children: Iterable[Any] = (), validation_level: Literal[None, 'full', 'default', 'minimal', 'none'] = None, move: bool = False, device: Device | None = None) CArray #
Create an ArrowArray wrapper from components
Given a schema, build an ArrowArray buffer-wise. This allows almost any array to be assembled; however, requires some knowledge of the Arrow Columnar specification. This function will do its best to validate the sizes and content of buffers according to
validation_level
; however, not all types of arrays can currently be validated when constructed in this way.Parameters#
- schemaschema-like
The data type of the desired array as sanitized by
c_schema()
.- lengthint
The length of the output array.
- buffersIterable of buffer-like or None
An iterable of buffers as sanitized by
c_buffer()
. Any object supporting the Python Buffer protocol is accepted. Buffer data types are not checked. A buffer value ofNone
will skip setting a buffer (i.e., that buffer will be of length zero and its pointer will beNULL
).- null_countint, optional
The number of null values, if known in advance. If -1 (the default), the null count will be calculated based on the validity bitmap. If the validity bitmap was set to
None
, the calculated null count will be zero.- offsetint, optional
The logical offset from the start of the array.
- childrenIterable of array-like
An iterable of arrays used to set child fields of the array. Can contain any object accepted by
c_array()
. Must contain the exact number of required children as specifed byschema
.- validation_level: None or str, optional
One of “none” (no check), “minimal” (check buffer sizes that do not require dereferencing buffer content), “default” (check all buffer sizes), or “full” (check all buffer sizes and all buffer content). The default,
None
, will validate at the “default” level where possible.- movebool, optional
Use
True
to move ownership of any input buffers or children to the output array.- deviceDevice, optional
An explicit device to use when constructing this array. If specified, this function will construct a
CDeviceArray
; if unspecified, this function will construct aCArray
on the CPU device.
Examples#
>>> import nanoarrow as na >>> c_array = na.c_array_from_buffers(na.uint8(), 5, [None, b"12345"]) >>> na.Array(c_array).inspect() <ArrowArray uint8> - length: 5 - offset: 0 - null_count: 0 - buffers[2]: - validity <bool[0 b] > - data <uint8[5 b] 49 50 51 52 53> - dictionary: NULL - children[0]:
- c_array_view(obj, schema=None) CArrayView #
ArrowArrayView wrapper
The
ArrowArrayView
is a nanoarrow C library structure that provides structured access to buffers addresses, buffer sizes, and buffer data types. The buffer data is usually propagated from an ArrowArray but can also be propagated from other types of objects (e.g., serialized IPC). The offset and length of this view are independent of its parent (i.e., this object can also represent a slice of its parent).Examples#
>>> import pyarrow as pa >>> import numpy as np >>> import nanoarrow as na >>> from nanoarrow.c_array import c_array_view >>> >>> array = na.c_array(pa.array(["one", "two", "three", None])) >>> array_view = c_array_view(array) >>> np.array(array_view.buffer(1)) array([ 0, 3, 6, 11, 11], dtype=int32) >>> np.array(array_view.buffer(2)) array([b'o', b'n', b'e', b't', b'w', b'o', b't', b'h', b'r', b'e', b'e'], dtype='|S1')
C ArrayStream Utilities#
- allocate_c_array_stream() CArrayStream #
Allocate an uninitialized ArrowArrayStream wrapper
Examples#
>>> import pyarrow as pa >>> from nanoarrow.c_array_stream import allocate_c_array_stream >>> pa_column = pa.array([1, 2, 3], pa.int32()) >>> pa_batch = pa.record_batch([pa_column], names=["col1"]) >>> pa_reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch]) >>> array_stream = allocate_c_array_stream() >>> pa_reader._export_to_c(array_stream._addr())
- c_array_stream(obj=None, schema=None) CArrayStream #
ArrowArrayStream wrapper
This class provides a user-facing interface to access the fields of an ArrowArrayStream as defined in the Arrow C Stream interface. These objects are usually created using nanoarrow.c_array_stream().
Examples#
>>> import pyarrow as pa >>> import nanoarrow as na >>> pa_column = pa.array([1, 2, 3], pa.int32()) >>> pa_batch = pa.record_batch([pa_column], names=["col1"]) >>> pa_reader = pa.RecordBatchReader.from_batches(pa_batch.schema, [pa_batch]) >>> array_stream = na.c_array_stream(pa_reader) >>> array_stream.get_schema() <nanoarrow.c_schema.CSchema struct> - format: '+s' - name: '' - flags: 0 - metadata: NULL - dictionary: NULL - children[1]: 'col1': <nanoarrow.c_schema.CSchema int32> - format: 'i' - name: 'col1' - flags: 2 - metadata: NULL - dictionary: NULL - children[0]: >>> array_stream.get_next().length 3 >>> array_stream.get_next() is None Traceback (most recent call last): ... StopIteration