IPC Extension Reference#

C API#

group nanoarrow_ipc

Except where noted, objects are not thread-safe and clients should take care to serialize accesses to methods.

Because this library is intended to be vendored, it provides full type definitions and encourages clients to stack or statically allocate where convenient.

Defines

NANOARROW_IPC_FEATURE_DICTIONARY_REPLACEMENT#

Feature flag for a stream that uses dictionary replacement.

NANOARROW_IPC_FEATURE_COMPRESSED_BODY#

Feature flag for a stream that uses compression.

Enums

enum ArrowIpcMetadataVersion#

Metadata version enumerator.

Values:

enumerator NANOARROW_IPC_METADATA_VERSION_V1#
enumerator NANOARROW_IPC_METADATA_VERSION_V2#
enumerator NANOARROW_IPC_METADATA_VERSION_V3#
enumerator NANOARROW_IPC_METADATA_VERSION_V4#
enumerator NANOARROW_IPC_METADATA_VERSION_V5#
enum ArrowIpcMessageType#

Message type enumerator.

Values:

enumerator NANOARROW_IPC_MESSAGE_TYPE_UNINITIALIZED#
enumerator NANOARROW_IPC_MESSAGE_TYPE_SCHEMA#
enumerator NANOARROW_IPC_MESSAGE_TYPE_DICTIONARY_BATCH#
enumerator NANOARROW_IPC_MESSAGE_TYPE_RECORD_BATCH#
enumerator NANOARROW_IPC_MESSAGE_TYPE_TENSOR#
enumerator NANOARROW_IPC_MESSAGE_TYPE_SPARSE_TENSOR#
enum ArrowIpcEndianness#

Endianness enumerator.

Values:

enumerator NANOARROW_IPC_ENDIANNESS_UNINITIALIZED#
enumerator NANOARROW_IPC_ENDIANNESS_LITTLE#
enumerator NANOARROW_IPC_ENDIANNESS_BIG#
enum ArrowIpcCompressionType#

Compression type enumerator.

Values:

enumerator NANOARROW_IPC_COMPRESSION_TYPE_NONE#
enumerator NANOARROW_IPC_COMPRESSION_TYPE_LZ4_FRAME#
enumerator NANOARROW_IPC_COMPRESSION_TYPE_ZSTD#

Functions

ArrowErrorCode ArrowIpcCheckRuntime(struct ArrowError *error)#

Checks the nanoarrow runtime to make sure the run/build versions match.

static inline enum ArrowIpcEndianness ArrowIpcSystemEndianness(void)#

Get the endianness of the current runtime.

ArrowErrorCode ArrowIpcSharedBufferInit(struct ArrowIpcSharedBuffer *shared, struct ArrowBuffer *src)#

Initialize the contents of a ArrowIpcSharedBuffer struct.

If NANOARROW_OK is returned, the ArrowIpcSharedBuffer takes ownership of src.

void ArrowIpcSharedBufferReset(struct ArrowIpcSharedBuffer *shared)#

Release the caller’s copy of the shared buffer.

When finished, the caller must relinquish its own copy of the shared data using this function. The original buffer will continue to exist until all ArrowArray objects that refer to it have also been released.

int ArrowIpcSharedBufferIsThreadSafe(void)#

Check for shared buffer thread safety.

Thread-safe shared buffers require C11 and the stdatomic.h header. If either are unavailable, shared buffers are still possible but the resulting arrays must not be passed to other threads to be released.

ArrowErrorCode ArrowIpcDecoderInit(struct ArrowIpcDecoder *decoder)#

Initialize a decoder.

void ArrowIpcDecoderReset(struct ArrowIpcDecoder *decoder)#

Release all resources attached to a decoder.

ArrowErrorCode ArrowIpcDecoderPeekHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, struct ArrowError *error)#

Peek at a message header.

The first 8 bytes of an Arrow IPC message are 0xFFFFFF followed by the size of the header as a little-endian 32-bit integer. ArrowIpcDecoderPeekHeader() reads these bytes and returns ESPIPE if there are not enough remaining bytes in data to read the entire header message, EINVAL if the first 8 bytes are not valid, ENODATA if the Arrow end-of-stream indicator has been reached, or NANOARROW_OK otherwise.

ArrowErrorCode ArrowIpcDecoderVerifyHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, struct ArrowError *error)#

Verify a message header.

Runs ArrowIpcDecoderPeekHeader() to ensure data is sufficiently large but additionally runs flatbuffer verification to ensure that decoding the data will not access memory outside of the buffer specified by data. ArrowIpcDecoderVerifyHeader() will also set decoder.header_size_bytes, decoder.body_size_bytes, decoder.metadata_version, and decoder.message_type.

Returns as ArrowIpcDecoderPeekHeader() and additionally will return EINVAL if flatbuffer verification fails.

ArrowErrorCode ArrowIpcDecoderDecodeHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, struct ArrowError *error)#

Decode a message header.

Runs ArrowIpcDecoderPeekHeader() to ensure data is sufficiently large and decodes the content of the message header. If data contains a schema message, decoder.endianness and decoder.feature_flags is set and ArrowIpcDecoderDecodeSchema() can be used to obtain the decoded schema. If data contains a record batch message, decoder.codec is set and a successful call can be followed by a call to ArrowIpcDecoderDecodeArray().

In almost all cases this should be preceded by a call to ArrowIpcDecoderVerifyHeader() to ensure decoding does not access data outside of the specified buffer.

Returns EINVAL if the content of the message cannot be decoded or ENOTSUP if the content of the message uses features not supported by this library.

ArrowErrorCode ArrowIpcDecoderDecodeSchema(struct ArrowIpcDecoder *decoder, struct ArrowSchema *out, struct ArrowError *error)#

Decode an ArrowSchema.

After a successful call to ArrowIpcDecoderDecodeHeader(), retrieve an ArrowSchema. The caller is responsible for releasing the schema if NANOARROW_OK is returned.

Returns EINVAL if the decoder did not just decode a schema message or NANOARROW_OK otherwise.

ArrowErrorCode ArrowIpcDecoderSetSchema(struct ArrowIpcDecoder *decoder, struct ArrowSchema *schema, struct ArrowError *error)#

Set the ArrowSchema used to decode future record batch messages.

Prepares the decoder for future record batch messages of this type. The decoder takes ownership of schema if NANOARROW_OK is returned. Note that you must call this explicitly after decoding a Schema message (i.e., the decoder does not assume that the last-decoded schema message applies to future record batch messages).

Returns EINVAL if schema validation fails or NANOARROW_OK otherwise.

ArrowErrorCode ArrowIpcDecoderSetEndianness(struct ArrowIpcDecoder *decoder, enum ArrowIpcEndianness endianness)#

Set the endianness used to decode future record batch messages.

Prepares the decoder for future record batch messages with the specified endianness. Note that you must call this explicitly after decoding a Schema message (i.e., the decoder does not assume that the last-decoded schema message applies to future record batch messages).

Returns NANOARROW_OK on success.

ArrowErrorCode ArrowIpcDecoderDecodeArrayView(struct ArrowIpcDecoder *decoder, struct ArrowBufferView body, int64_t i, struct ArrowArrayView **out, struct ArrowError *error)#

Decode an ArrowArrayView.

After a successful call to ArrowIpcDecoderDecodeHeader(), deserialize the content of body into an internally-managed ArrowArrayView and return it. Note that field index does not equate to column index if any columns contain nested types. Use a value of -1 to decode the entire array into a struct. The pointed-to ArrowArrayView is owned by the ArrowIpcDecoder and must not be released.

For streams that match system endianness and do not use compression, this operation will not perform any heap allocations; however, the buffers referred to by the returned ArrowArrayView are only valid as long as the buffer referred to by body stays valid.

ArrowErrorCode ArrowIpcDecoderDecodeArray(struct ArrowIpcDecoder *decoder, struct ArrowBufferView body, int64_t i, struct ArrowArray *out, enum ArrowValidationLevel validation_level, struct ArrowError *error)#

Decode an ArrowArray.

After a successful call to ArrowIpcDecoderDecodeHeader(), assemble an ArrowArray given a message body and a field index. Note that field index does not equate to column index if any columns contain nested types. Use a value of -1 to decode the entire array into a struct. The caller is responsible for releasing the array if NANOARROW_OK is returned.

Returns EINVAL if the decoder did not just decode a record batch message, ENOTSUP if the message uses features not supported by this library, or or NANOARROW_OK otherwise.

ArrowErrorCode ArrowIpcDecoderDecodeArrayFromShared(struct ArrowIpcDecoder *decoder, struct ArrowIpcSharedBuffer *shared, int64_t i, struct ArrowArray *out, enum ArrowValidationLevel validation_level, struct ArrowError *error)#

Decode an ArrowArray from an owned buffer.

This implementation takes advantage of the fact that it can avoid copying individual buffers. In all cases the caller must ArrowIpcSharedBufferReset() body after one or more calls to ArrowIpcDecoderDecodeArrayFromShared(). If ArrowIpcSharedBufferIsThreadSafe() returns 0, out must not be released by another thread.

void ArrowIpcInputStreamMove(struct ArrowIpcInputStream *src, struct ArrowIpcInputStream *dst)#

Transfer ownership of an ArrowIpcInputStream.

ArrowErrorCode ArrowIpcInputStreamInitBuffer(struct ArrowIpcInputStream *stream, struct ArrowBuffer *input)#

Create an input stream from an ArrowBuffer.

ArrowErrorCode ArrowIpcInputStreamInitFile(struct ArrowIpcInputStream *stream, void *file_ptr, int close_on_release)#

Create an input stream from a C FILE* pointer.

Note that the ArrowIpcInputStream has no mechanism to communicate an error if file_ptr fails to close. If this behaviour is needed, pass false to close_on_release and handle closing the file independently from stream.

ArrowErrorCode ArrowIpcArrayStreamReaderInit(struct ArrowArrayStream *out, struct ArrowIpcInputStream *input_stream, struct ArrowIpcArrayStreamReaderOptions *options)#

Initialize an ArrowArrayStream from an input stream of bytes.

The stream of bytes must begin with a Schema message and be followed by zero or more RecordBatch messages as described in the Arrow IPC stream format specification. Returns NANOARROW_OK on success. If NANOARROW_OK is returned, the ArrowArrayStream takes ownership of input_stream and the caller is responsible for releasing out.

ArrowErrorCode ArrowIpcEncoderInit(struct ArrowIpcEncoder *encoder)#

Initialize an encoder.

If NANOARROW_OK is returned, the caller must call ArrowIpcEncoderReset() to release resources allocated by this function.

void ArrowIpcEncoderReset(struct ArrowIpcEncoder *encoder)#

Release all resources attached to an encoder.

ArrowErrorCode ArrowIpcEncoderFinalizeBuffer(struct ArrowIpcEncoder *encoder, struct ArrowBuffer *out)#

Finalize the most recently encoded message to a buffer.

The bytes of the encoded message will be appended to the provided buffer.

struct ArrowIpcSharedBuffer#
#include <nanoarrow_ipc.h>

A structure representing a reference-counted buffer that may be passed to ArrowIpcDecoderDecodeArrayFromShared().

struct ArrowIpcDecoder#
#include <nanoarrow_ipc.h>

Decoder for Arrow IPC messages.

This structure is intended to be allocated by the caller, initialized using ArrowIpcDecoderInit(), and released with ArrowIpcDecoderReset(). These fields should not be modified by the caller but can be read following a call to ArrowIpcDecoderPeekHeader(), ArrowIpcDecoderVerifyHeader(), or ArrowIpcDecoderDecodeHeader().

Public Members

enum ArrowIpcMessageType message_type#

The last verified or decoded message type.

enum ArrowIpcMetadataVersion metadata_version#

The metadata version as indicated by the current schema message.

enum ArrowIpcEndianness endianness#

Buffer endianness as indicated by the current schema message.

int32_t feature_flags#

Arrow IPC Features used as indicated by the current Schema message.

enum ArrowIpcCompressionType codec#

Compression used by the current RecordBatch message.

int32_t header_size_bytes#

The number of bytes in the current header message.

This value includes the 8 bytes before the start of the header message content and any padding bytes required to make the header message size be a multiple of 8 bytes.

int64_t body_size_bytes#

The number of bytes in the forthcoming body message.

void *private_data#

Private resources managed by this library.

struct ArrowIpcInputStream#
#include <nanoarrow_ipc.h>

An user-extensible input data source.

Public Members

ArrowErrorCode (*read)(struct ArrowIpcInputStream *stream, uint8_t *buf, int64_t buf_size_bytes, int64_t *size_read_out, struct ArrowError *error)#

Read up to buf_size_bytes from stream into buf.

The actual number of bytes read is placed in the value pointed to by size_read_out. Returns NANOARROW_OK on success.

void (*release)(struct ArrowIpcInputStream *stream)#

Release the stream and any resources it may be holding.

Release callback implementations must set the release member to NULL. Callers must check that the release callback is not NULL before calling read() or release().

void *private_data#

Private implementation-defined data.

struct ArrowIpcArrayStreamReaderOptions#
#include <nanoarrow_ipc.h>

Options for ArrowIpcArrayStreamReaderInit()

Public Members

int64_t field_index#

The field index to extract.

Defaults to -1 (i.e., read all fields). Note that this field index refers to the flattened tree of children and not necessarily the column index.

int use_shared_buffers#

Set to a non-zero value to share the message body buffer among decoded arrays.

Sharing buffers is a good choice when (1) using memory-mapped IO (since unreferenced portions of the file are often not loaded into memory) or (2) if all data from all columns are about to be referenced anyway. When loading a single field there is probably no advantage to using shared buffers. Defaults to the value of ArrowIpcSharedBufferIsThreadSafe().

struct ArrowIpcEncoder#
#include <nanoarrow_ipc.h>

Encoder for Arrow IPC messages.

This structure is intended to be allocated by the caller, initialized using ArrowIpcEncoderInit(), and released with ArrowIpcEncoderReset().

Public Members

enum ArrowIpcCompressionType codec#

Compression to encode in the next RecordBatch message.

ArrowErrorCode (*encode_buffer)(struct ArrowBufferView buffer_view, struct ArrowIpcEncoder *encoder, int64_t *offset, int64_t *length, struct ArrowError *error)#

Callback invoked against each buffer to be encoded.

Encoding of buffers is left as a callback to accommodate dissociated data storage. One implementation of this callback might copy all buffers into a contiguous body for use in an arrow IPC stream, another implementation might store offsets and lengths relative to a known arena.

void *encode_buffer_state#

Pointer to arbitrary data used by encode_buffer()

int64_t body_length#

Finalized body length of the most recently encoded RecordBatch message.

(This is initially 0 and encode_buffer() is expected to update it. After all buffers are encoded, this will be written to the RecordBatch’s .bodyLength)

void *private_data#

Private resources managed by this library.

C++ Helpers#

group nanoarrow_ipc_hpp-unique

Extends the unique object wrappers in nanoarrow.hpp to include C structs defined in the nanoarrow_ipc.h header.

Typedefs

using UniqueDecoder = internal::Unique<struct ArrowIpcDecoder>#

Class wrapping a unique struct ArrowIpcDecoder.

using UniqueEncoder = internal::Unique<struct ArrowIpcEncoder>#

Class wrapping a unique struct ArrowIpcEncoder.

using UniqueInputStream = internal::Unique<struct ArrowIpcInputStream>#

Class wrapping a unique struct ArrowIpcInputStream.