IPC Extension Reference#
C API#
- group nanoarrow_ipc
Except where noted, objects are not thread-safe and clients should take care to serialize accesses to methods.
Because this library is intended to be vendored, it provides full type definitions and encourages clients to stack or statically allocate where convenient.
Defines
-
NANOARROW_IPC_FEATURE_DICTIONARY_REPLACEMENT#
Feature flag for a stream that uses dictionary replacement.
-
NANOARROW_IPC_FEATURE_COMPRESSED_BODY#
Feature flag for a stream that uses compression.
Typedefs
-
typedef ArrowErrorCode (*ArrowIpcDecompressFunction)(struct ArrowBufferView src, uint8_t *dst, int64_t dst_size, struct ArrowError *error)#
A self-contained decompression function.
For the most common compression type, ZSTD, this function is sufficient to capture the type of decompression that Arrow IPC requires (i.e., decompression where the uncompressed size was recorded). For other compression types, it may be more efficient to implement a full ArrowIpcDecompressor, which allows for persistent state/allocations between decodes.
Enums
-
enum ArrowIpcMetadataVersion#
Metadata version enumerator.
Values:
-
enumerator NANOARROW_IPC_METADATA_VERSION_V1#
-
enumerator NANOARROW_IPC_METADATA_VERSION_V2#
-
enumerator NANOARROW_IPC_METADATA_VERSION_V3#
-
enumerator NANOARROW_IPC_METADATA_VERSION_V4#
-
enumerator NANOARROW_IPC_METADATA_VERSION_V5#
-
enumerator NANOARROW_IPC_METADATA_VERSION_V1#
-
enum ArrowIpcMessageType#
Message type enumerator.
Values:
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_UNINITIALIZED#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_SCHEMA#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_DICTIONARY_BATCH#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_RECORD_BATCH#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_TENSOR#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_SPARSE_TENSOR#
-
enumerator NANOARROW_IPC_MESSAGE_TYPE_UNINITIALIZED#
Functions
-
ArrowErrorCode ArrowIpcCheckRuntime(struct ArrowError *error)#
Checks the nanoarrow runtime to make sure the run/build versions match.
-
static inline enum ArrowIpcEndianness ArrowIpcSystemEndianness(void)#
Get the endianness of the current runtime.
Initialize the contents of a ArrowIpcSharedBuffer struct.
If NANOARROW_OK is returned, the ArrowIpcSharedBuffer takes ownership of src.
Release the caller’s copy of the shared buffer.
When finished, the caller must relinquish its own copy of the shared data using this function. The original buffer will continue to exist until all ArrowArray objects that refer to it have also been released.
Check for shared buffer thread safety.
Thread-safe shared buffers require C11 and the stdatomic.h header. If either are unavailable, shared buffers are still possible but the resulting arrays must not be passed to other threads to be released.
-
ArrowIpcDecompressFunction ArrowIpcGetZstdDecompressionFunction(void)#
Get the decompression function for ZSTD.
The result will be NULL if nanoarrow was not built with NANOARROW_IPC_WITH_ZSTD.
-
ArrowErrorCode ArrowIpcSerialDecompressor(struct ArrowIpcDecompressor *decompressor)#
An ArrowIpcDecompressor implementation that performs decompression in serial.
-
ArrowErrorCode ArrowIpcSerialDecompressorSetFunction(struct ArrowIpcDecompressor *decompressor, enum ArrowIpcCompressionType compression_type, ArrowIpcDecompressFunction decompress_function)#
Override the ArrowIpcDecompressFunction used for a specific compression type.
This may be used to inject support for a particular type of decompression if used with a version of nanoarrow with unknown or minimal capabilities.
-
ArrowErrorCode ArrowIpcDecoderInit(struct ArrowIpcDecoder *decoder)#
Initialize a decoder.
-
void ArrowIpcDecoderReset(struct ArrowIpcDecoder *decoder)#
Release all resources attached to a decoder.
-
ArrowErrorCode ArrowIpcDecoderSetDecompressor(struct ArrowIpcDecoder *decoder, struct ArrowIpcDecompressor *decompressor)#
Set the decompressor implementation used by this decoder.
-
ArrowErrorCode ArrowIpcDecoderPeekHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, int32_t *prefix_size_bytes, struct ArrowError *error)#
Peek at a message header.
The first 8 bytes of an Arrow IPC message are 0xFFFFFFFF followed by the size of the header as a little-endian 32-bit integer. ArrowIpcDecoderPeekHeader() reads these bytes and returns ESPIPE if there are not enough remaining bytes in data to read the entire header message, EINVAL if the first 8 bytes are not valid, ENODATA if the Arrow end-of-stream indicator has been reached, or NANOARROW_OK otherwise.
Pre-1.0 messages were not prefixed with 0xFFFFFFFF. For these messages, a value of 4 will be placed into prefix_size_bytes; otherwise a value of 8 will be placed into prefix_size_bytes.
-
ArrowErrorCode ArrowIpcDecoderVerifyHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, struct ArrowError *error)#
Verify a message header.
Runs ArrowIpcDecoderPeekHeader() to ensure data is sufficiently large but additionally runs flatbuffer verification to ensure that decoding the data will not access memory outside of the buffer specified by data. ArrowIpcDecoderVerifyHeader() will also set decoder.header_size_bytes, decoder.body_size_bytes, decoder.metadata_version, and decoder.message_type.
Returns as ArrowIpcDecoderPeekHeader() and additionally will return EINVAL if flatbuffer verification fails.
-
ArrowErrorCode ArrowIpcDecoderDecodeHeader(struct ArrowIpcDecoder *decoder, struct ArrowBufferView data, struct ArrowError *error)#
Decode a message header.
Runs ArrowIpcDecoderPeekHeader() to ensure data is sufficiently large and decodes the content of the message header. If data contains a schema message, decoder.endianness and decoder.feature_flags is set and ArrowIpcDecoderDecodeSchema() can be used to obtain the decoded schema. If data contains a record batch message, decoder.codec is set and a successful call can be followed by a call to ArrowIpcDecoderDecodeArray().
In almost all cases this should be preceded by a call to ArrowIpcDecoderVerifyHeader() to ensure decoding does not access data outside of the specified buffer.
Returns EINVAL if the content of the message cannot be decoded or ENOTSUP if the content of the message uses features not supported by this library.
-
ArrowErrorCode ArrowIpcDecoderDecodeSchema(struct ArrowIpcDecoder *decoder, struct ArrowSchema *out, struct ArrowError *error)#
Decode an ArrowSchema.
After a successful call to ArrowIpcDecoderDecodeHeader(), retrieve an ArrowSchema. The caller is responsible for releasing the schema if NANOARROW_OK is returned.
Returns EINVAL if the decoder did not just decode a schema message or NANOARROW_OK otherwise.
-
ArrowErrorCode ArrowIpcDecoderSetSchema(struct ArrowIpcDecoder *decoder, struct ArrowSchema *schema, struct ArrowError *error)#
Set the ArrowSchema used to decode future record batch messages.
Prepares the decoder for future record batch messages of this type. The decoder takes ownership of schema if NANOARROW_OK is returned. Note that you must call this explicitly after decoding a Schema message (i.e., the decoder does not assume that the last-decoded schema message applies to future record batch messages).
Returns EINVAL if schema validation fails or NANOARROW_OK otherwise.
-
ArrowErrorCode ArrowIpcDecoderSetEndianness(struct ArrowIpcDecoder *decoder, enum ArrowIpcEndianness endianness)#
Set the endianness used to decode future record batch messages.
Prepares the decoder for future record batch messages with the specified endianness. Note that you must call this explicitly after decoding a Schema message (i.e., the decoder does not assume that the last-decoded schema message applies to future record batch messages).
Returns NANOARROW_OK on success.
-
ArrowErrorCode ArrowIpcDecoderDecodeArrayView(struct ArrowIpcDecoder *decoder, struct ArrowBufferView body, int64_t i, struct ArrowArrayView **out, struct ArrowError *error)#
Decode an ArrowArrayView.
After a successful call to ArrowIpcDecoderDecodeHeader(), deserialize the content of body into an internally-managed ArrowArrayView and return it. Note that field index does not equate to column index if any columns contain nested types. Use a value of -1 to decode the entire array into a struct. The pointed-to ArrowArrayView is owned by the ArrowIpcDecoder and must not be released.
For streams that match system endianness and do not use compression, this operation will not perform any heap allocations; however, the buffers referred to by the returned ArrowArrayView are only valid as long as the buffer referred to by body stays valid.
-
ArrowErrorCode ArrowIpcDecoderDecodeArray(struct ArrowIpcDecoder *decoder, struct ArrowBufferView body, int64_t i, struct ArrowArray *out, enum ArrowValidationLevel validation_level, struct ArrowError *error)#
Decode an ArrowArray.
After a successful call to ArrowIpcDecoderDecodeHeader(), assemble an ArrowArray given a message body and a field index. Note that field index does not equate to column index if any columns contain nested types. Use a value of -1 to decode the entire array into a struct. The caller is responsible for releasing the array if NANOARROW_OK is returned.
Returns EINVAL if the decoder did not just decode a record batch message, ENOTSUP if the message uses features not supported by this library, or or NANOARROW_OK otherwise.
Decode an ArrowArray from an owned buffer.
This implementation takes advantage of the fact that it can avoid copying individual buffers. In all cases the caller must ArrowIpcSharedBufferReset() body after one or more calls to ArrowIpcDecoderDecodeArrayFromShared(). If ArrowIpcSharedBufferIsThreadSafe() returns 0, out must not be released by another thread.
-
void ArrowIpcInputStreamMove(struct ArrowIpcInputStream *src, struct ArrowIpcInputStream *dst)#
Transfer ownership of an ArrowIpcInputStream.
-
ArrowErrorCode ArrowIpcInputStreamInitBuffer(struct ArrowIpcInputStream *stream, struct ArrowBuffer *input)#
Create an input stream from an ArrowBuffer.
The stream takes ownership of the buffer and reads bytes from it.
-
ArrowErrorCode ArrowIpcInputStreamInitFile(struct ArrowIpcInputStream *stream, void *file_ptr, int close_on_release)#
Create an input stream from a C FILE* pointer.
Note that the ArrowIpcInputStream has no mechanism to communicate an error if file_ptr fails to close. If this behaviour is needed, pass false to close_on_release and handle closing the file independently from stream.
-
ArrowErrorCode ArrowIpcArrayStreamReaderInit(struct ArrowArrayStream *out, struct ArrowIpcInputStream *input_stream, struct ArrowIpcArrayStreamReaderOptions *options)#
Initialize an ArrowArrayStream from an input stream of bytes.
The stream of bytes must begin with a Schema message and be followed by zero or more RecordBatch messages as described in the Arrow IPC stream format specification. Returns NANOARROW_OK on success. If NANOARROW_OK is returned, the ArrowArrayStream takes ownership of input_stream and the caller is responsible for releasing out.
-
ArrowErrorCode ArrowIpcEncoderInit(struct ArrowIpcEncoder *encoder)#
Initialize an encoder.
If NANOARROW_OK is returned, the caller must call ArrowIpcEncoderReset() to release resources allocated by this function.
-
void ArrowIpcEncoderReset(struct ArrowIpcEncoder *encoder)#
Release all resources attached to an encoder.
-
ArrowErrorCode ArrowIpcEncoderFinalizeBuffer(struct ArrowIpcEncoder *encoder, char encapsulate, struct ArrowBuffer *out)#
Finalize the most recently encoded message into a buffer.
If specified, the message will be encapsulated (prefixed with the continuation marker and the header size and 0-padded to a multiple of 8 bytes).
The bytes of the encoded message will be appended to the provided buffer.
-
ArrowErrorCode ArrowIpcEncoderEncodeSchema(struct ArrowIpcEncoder *encoder, const struct ArrowSchema *schema, struct ArrowError *error)#
Encode an ArrowSchema.
Returns ENOMEM if allocation fails, NANOARROW_OK otherwise.
-
ArrowErrorCode ArrowIpcEncoderEncodeSimpleRecordBatch(struct ArrowIpcEncoder *encoder, const struct ArrowArrayView *array_view, struct ArrowBuffer *body_buffer, struct ArrowError *error)#
Encode a struct typed ArrayView to a flatbuffer RecordBatch, embedded in a Message.
Body buffers are concatenated into a contiguous, padded body_buffer.
Returns ENOMEM if allocation fails, NANOARROW_OK otherwise.
-
void ArrowIpcOutputStreamMove(struct ArrowIpcOutputStream *src, struct ArrowIpcOutputStream *dst)#
Transfer ownership of an ArrowIpcOutputStream.
-
ArrowErrorCode ArrowIpcOutputStreamInitBuffer(struct ArrowIpcOutputStream *stream, struct ArrowBuffer *output)#
Create an output stream from an ArrowBuffer.
All bytes witten to the stream will be appended to the buffer. The stream does not take ownership of the buffer.
-
ArrowErrorCode ArrowIpcOutputStreamInitFile(struct ArrowIpcOutputStream *stream, void *file_ptr, int close_on_release)#
Create an output stream from a C FILE* pointer.
Note that the ArrowIpcOutputStream has no mechanism to communicate an error if file_ptr fails to close. If this behaviour is needed, pass false to close_on_release and handle closing the file independently from stream.
-
ArrowErrorCode ArrowIpcOutputStreamWrite(struct ArrowIpcOutputStream *stream, struct ArrowBufferView data, struct ArrowError *error)#
Write to a stream, trying again until all are written or the stream errors.
-
ArrowErrorCode ArrowIpcWriterInit(struct ArrowIpcWriter *writer, struct ArrowIpcOutputStream *output_stream)#
Initialize an output stream of bytes from an ArrowArrayStream.
Returns NANOARROW_OK on success. If NANOARROW_OK is returned the writer takes ownership of the output byte stream, and the caller is responsible for releasing the writer by calling ArrowIpcWriterReset().
-
void ArrowIpcWriterReset(struct ArrowIpcWriter *writer)#
Release all resources attached to a writer.
-
ArrowErrorCode ArrowIpcWriterWriteSchema(struct ArrowIpcWriter *writer, const struct ArrowSchema *in, struct ArrowError *error)#
Write a schema to the output byte stream.
Errors are propagated from the underlying encoder and output byte stream.
-
ArrowErrorCode ArrowIpcWriterWriteArrayView(struct ArrowIpcWriter *writer, const struct ArrowArrayView *in, struct ArrowError *error)#
Write an array view to the output byte stream.
The array view may be NULL, in which case an EOS will be written. The writer does not check that a schema was already written.
Errors are propagated from the underlying encoder and output byte stream,
-
ArrowErrorCode ArrowIpcWriterWriteArrayStream(struct ArrowIpcWriter *writer, struct ArrowArrayStream *in, struct ArrowError *error)#
Write an entire stream (including EOS) to the output byte stream.
Errors are propagated from the underlying encoder, array stream, and output byte stream.
-
ArrowErrorCode ArrowIpcWriterStartFile(struct ArrowIpcWriter *writer, struct ArrowError *error)#
Start writing an IPC file.
Writes the Arrow IPC magic and sets the writer up to track written blocks.
-
ArrowErrorCode ArrowIpcWriterFinalizeFile(struct ArrowIpcWriter *writer, struct ArrowError *error)#
Finish writing an IPC file.
Writes the IPC file’s footer, footer size, and ending magic.
- #include <nanoarrow_ipc.h>
A structure representing a reference-counted buffer that may be passed to ArrowIpcDecoderDecodeArrayFromShared().
-
struct ArrowIpcDecompressor#
- #include <nanoarrow_ipc.h>
A user-extensible decompressor.
The ArrowIpcDecompressor is the underlying object that enables decompression in the ArrowIpcDecoder. Its structure allows it to be backed by a multithreaded implementation; however, this is not required and the default implementation does not implement this. An implementation of a decompressor may support more than one ArrowIpcCompressionType.
Public Members
-
ArrowErrorCode (*decompress_add)(struct ArrowIpcDecompressor *decompressor, enum ArrowIpcCompressionType compression_type, struct ArrowBufferView src, uint8_t *dst, int64_t dst_size, struct ArrowError *error)#
Queue a buffer for decompression.
The values pointed to by dst and dst_size after a call to decompress_add are undefined until the next call to decompress_wait returns NANOARROW_OK.
-
ArrowErrorCode (*decompress_wait)(struct ArrowIpcDecompressor *decompressor, int64_t timeout_ms, struct ArrowError *error)#
Wait for any unfinished calls to decompress_add to complete.
Returns NANOARROW_OK if all pending calls completed. Returns ETIMEOUT if not all remaining calls completed.
-
void (*release)(struct ArrowIpcDecompressor *decompressor)#
Release the decompressor and any resources it may be holding.
Release callback implementations must set the release member to NULL. Callers must check that the release callback is not NULL before calling decompress() or release().
-
void *private_data#
Implementation-specific opaque data.
-
ArrowErrorCode (*decompress_add)(struct ArrowIpcDecompressor *decompressor, enum ArrowIpcCompressionType compression_type, struct ArrowBufferView src, uint8_t *dst, int64_t dst_size, struct ArrowError *error)#
-
struct ArrowIpcDecoder#
- #include <nanoarrow_ipc.h>
Decoder for Arrow IPC messages.
This structure is intended to be allocated by the caller, initialized using ArrowIpcDecoderInit(), and released with ArrowIpcDecoderReset(). These fields should not be modified by the caller but can be read following a call to ArrowIpcDecoderPeekHeader(), ArrowIpcDecoderVerifyHeader(), or ArrowIpcDecoderDecodeHeader().
Public Members
-
enum ArrowIpcMessageType message_type#
The last verified or decoded message type.
-
enum ArrowIpcMetadataVersion metadata_version#
The metadata version as indicated by the current schema message.
-
enum ArrowIpcEndianness endianness#
Buffer endianness as indicated by the current schema message.
-
int32_t feature_flags#
Arrow IPC Features used as indicated by the current Schema message.
-
enum ArrowIpcCompressionType codec#
Compression used by the current RecordBatch message.
-
int32_t header_size_bytes#
The number of bytes in the current header message.
This value includes the 8 bytes before the start of the header message content and any padding bytes required to make the header message size be a multiple of 8 bytes.
-
int64_t body_size_bytes#
The number of bytes in the forthcoming body message.
The last decoded Footer.
Warning
This API is currently only public for use in integration testing; use at your own risk.
-
void *private_data#
Private resources managed by this library.
-
enum ArrowIpcMessageType message_type#
-
struct ArrowIpcInputStream#
- #include <nanoarrow_ipc.h>
An user-extensible input data source.
Public Members
-
ArrowErrorCode (*read)(struct ArrowIpcInputStream *stream, uint8_t *buf, int64_t buf_size_bytes, int64_t *size_read_out, struct ArrowError *error)#
Read up to buf_size_bytes from stream into buf.
The actual number of bytes read is placed in the value pointed to by size_read_out. Returns NANOARROW_OK on success.
-
void (*release)(struct ArrowIpcInputStream *stream)#
Release the stream and any resources it may be holding.
Release callback implementations must set the release member to NULL. Callers must check that the release callback is not NULL before calling read() or release().
-
void *private_data#
Private implementation-defined data.
-
ArrowErrorCode (*read)(struct ArrowIpcInputStream *stream, uint8_t *buf, int64_t buf_size_bytes, int64_t *size_read_out, struct ArrowError *error)#
-
struct ArrowIpcArrayStreamReaderOptions#
- #include <nanoarrow_ipc.h>
Options for ArrowIpcArrayStreamReaderInit()
Public Members
-
int64_t field_index#
The field index to extract.
Defaults to -1 (i.e., read all fields). Note that this field index refers to the flattened tree of children and not necessarily the column index.
Set to a non-zero value to share the message body buffer among decoded arrays.
Sharing buffers is a good choice when (1) using memory-mapped IO (since unreferenced portions of the file are often not loaded into memory) or (2) if all data from all columns are about to be referenced anyway. When loading a single field there is probably no advantage to using shared buffers. Defaults to the value of ArrowIpcSharedBufferIsThreadSafe().
-
int64_t field_index#
-
struct ArrowIpcEncoder#
- #include <nanoarrow_ipc.h>
Encoder for Arrow IPC messages.
This structure is intended to be allocated by the caller, initialized using ArrowIpcEncoderInit(), and released with ArrowIpcEncoderReset().
Public Members
-
void *private_data#
Private resources managed by this library.
-
void *private_data#
-
struct ArrowIpcOutputStream#
- #include <nanoarrow_ipc.h>
An user-extensible output data sink.
Public Members
-
ArrowErrorCode (*write)(struct ArrowIpcOutputStream *stream, const void *buf, int64_t buf_size_bytes, int64_t *size_written_out, struct ArrowError *error)#
Write up to buf_size_bytes from stream into buf.
The actual number of bytes written is placed in the value pointed to by size_read_out. Returns NANOARROW_OK on success.
-
void (*release)(struct ArrowIpcOutputStream *stream)#
Release the stream and any resources it may be holding.
Release callback implementations must set the release member to NULL. Callers must check that the release callback is not NULL before calling read() or release().
-
void *private_data#
Private implementation-defined data.
-
ArrowErrorCode (*write)(struct ArrowIpcOutputStream *stream, const void *buf, int64_t buf_size_bytes, int64_t *size_written_out, struct ArrowError *error)#
-
struct ArrowIpcWriter#
- #include <nanoarrow_ipc.h>
A stream writer which encodes Schemas and ArrowArrays into an IPC byte stream.
This structure is intended to be allocated by the caller, initialized using ArrowIpcWriterInit(), and released with ArrowIpcWriterReset().
Public Members
-
void *private_data#
Private resources managed by this library.
-
void *private_data#
-
NANOARROW_IPC_FEATURE_DICTIONARY_REPLACEMENT#
C++ Helpers#
- group nanoarrow_ipc_hpp-unique
Extends the unique object wrappers in nanoarrow.hpp to include C structs defined in the nanoarrow_ipc.h header.
Typedefs
-
using UniqueDecoder = internal::Unique<struct ArrowIpcDecoder>#
Class wrapping a unique struct ArrowIpcDecoder.
Class wrapping a unique struct ArrowIpcFooter.
-
using UniqueEncoder = internal::Unique<struct ArrowIpcEncoder>#
Class wrapping a unique struct ArrowIpcEncoder.
-
using UniqueDecompressor = internal::Unique<struct ArrowIpcDecompressor>#
Class wrapping a unique struct ArrowIpcDecompressor.
-
using UniqueInputStream = internal::Unique<struct ArrowIpcInputStream>#
Class wrapping a unique struct ArrowIpcInputStream.
-
using UniqueOutputStream = internal::Unique<struct ArrowIpcOutputStream>#
Class wrapping a unique struct ArrowIpcOutputStream.
-
using UniqueWriter = internal::Unique<struct ArrowIpcWriter>#
Class wrapping a unique struct ArrowIpcWriter.
-
using UniqueDecoder = internal::Unique<struct ArrowIpcDecoder>#