Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.
Namespaces | Classes | Typedefs | Enumerations | Functions | Variables
arrow::ipc Namespace Reference

Namespaces

 feather
 

Classes

class  DictionaryMemo
 Memoization data structure for handling shared dictionaries. More...
 
class  Message
 An IPC message including metadata and body. More...
 
class  MessageReader
 Abstract interface for a sequence of messages. More...
 
class  RecordBatchFileReader
 Reads the record batch file format. More...
 
class  RecordBatchFileWriter
 Creates the Arrow record batch file format. More...
 
class  RecordBatchStreamReader
 Synchronous batch stream reader that reads from io::InputStream. More...
 
class  RecordBatchStreamWriter
 Synchronous batch stream writer that writes the Arrow streaming format. More...
 
class  RecordBatchWriter
 Abstract interface for writing a stream of record batches. More...
 

Typedefs

using DictionaryMap = std::unordered_map< int64_t, std::shared_ptr< Array > >
 
using DictionaryTypeMap = std::unordered_map< int64_t, std::shared_ptr< Field > >
 
using RecordBatchReader = ::arrow::RecordBatchReader
 

Enumerations

enum  MetadataVersion : char { MetadataVersion::V1, MetadataVersion::V2, MetadataVersion::V3, MetadataVersion::V4 }
 

Functions

std::string FormatMessageType (Message::Type type)
 
Status ReadMessage (const int64_t offset, const int32_t metadata_length, io::RandomAccessFile *file, std::unique_ptr< Message > *message)
 Read encapsulated RPC message from position in file. More...
 
Status ReadMessage (io::InputStream *stream, bool aligned, std::unique_ptr< Message > *message)
 Read encapsulated RPC message (metadata and body) from InputStream. More...
 
Status ReadMessage (io::InputStream *stream, std::unique_ptr< Message > *message)
 Read encapsulated RPC message (metadata and body) from InputStream. More...
 
Status ReadSchema (io::InputStream *stream, std::shared_ptr< Schema > *out)
 Read Schema from stream serialized as a sequence of one or more IPC messages. More...
 
Status ReadRecordBatch (const std::shared_ptr< Schema > &schema, io::InputStream *stream, std::shared_ptr< RecordBatch > *out)
 Read record batch as encapsulated IPC message with metadata size prefix and header. More...
 
Status ReadRecordBatch (const Buffer &metadata, const std::shared_ptr< Schema > &schema, io::RandomAccessFile *file, std::shared_ptr< RecordBatch > *out)
 Read record batch from file given metadata and schema. More...
 
Status ReadRecordBatch (const Message &message, const std::shared_ptr< Schema > &schema, std::shared_ptr< RecordBatch > *out)
 Read record batch from encapsulated Message. More...
 
Status ReadRecordBatch (const Buffer &metadata, const std::shared_ptr< Schema > &schema, int max_recursion_depth, io::RandomAccessFile *file, std::shared_ptr< RecordBatch > *out)
 Read record batch from file given metadata and schema. More...
 
Status ReadTensor (int64_t offset, io::RandomAccessFile *file, std::shared_ptr< Tensor > *out)
 EXPERIMENTAL: Read arrow::Tensor as encapsulated IPC message in file. More...
 
Status ReadTensor (const Message &message, std::shared_ptr< Tensor > *out)
 EXPERIMENTAL: Read arrow::Tensor from IPC message. More...
 
Status WriteRecordBatch (const RecordBatch &batch, int64_t buffer_start_offset, io::OutputStream *dst, int32_t *metadata_length, int64_t *body_length, MemoryPool *pool, int max_recursion_depth=kMaxNestingDepth, bool allow_64bit=false)
 Low-level API for writing a record batch (without schema) to an OutputStream. More...
 
Status SerializeRecordBatch (const RecordBatch &batch, MemoryPool *pool, std::shared_ptr< Buffer > *out)
 Serialize record batch as encapsulated IPC message in a new buffer. More...
 
Status SerializeRecordBatch (const RecordBatch &batch, MemoryPool *pool, io::OutputStream *out)
 Write record batch to OutputStream. More...
 
Status SerializeSchema (const Schema &schema, MemoryPool *pool, std::shared_ptr< Buffer > *out)
 Serialize schema using stream writer as a sequence of one or more IPC messages. More...
 
Status WriteRecordBatchStream (const std::vector< std::shared_ptr< RecordBatch >> &batches, io::OutputStream *dst)
 Write multiple record batches to OutputStream, including schema. More...
 
Status GetRecordBatchSize (const RecordBatch &batch, int64_t *size)
 Compute the number of bytes needed to write a record batch including metadata. More...
 
Status GetTensorSize (const Tensor &tensor, int64_t *size)
 Compute the number of bytes needed to write a tensor including metadata. More...
 
Status GetTensorMessage (const Tensor &tensor, MemoryPool *pool, std::unique_ptr< Message > *out)
 EXPERIMENTAL: Convert arrow::Tensor to a Message with minimal memory allocation. More...
 
Status WriteTensor (const Tensor &tensor, io::OutputStream *dst, int32_t *metadata_length, int64_t *body_length)
 EXPERIMENTAL: Write arrow::Tensor as a contiguous message. More...
 

Variables

constexpr int kMaxNestingDepth = 64
 

Typedef Documentation

◆ DictionaryMap

using arrow::ipc::DictionaryMap = typedef std::unordered_map<int64_t, std::shared_ptr<Array> >

◆ DictionaryTypeMap

using arrow::ipc::DictionaryTypeMap = typedef std::unordered_map<int64_t, std::shared_ptr<Field> >

◆ RecordBatchReader

Enumeration Type Documentation

◆ MetadataVersion

enum arrow::ipc::MetadataVersion : char
strong
Enumerator
V1 

0.1.0

V2 

0.2.0

V3 

0.3.0 to 0.7.1

V4 

>= 0.8.0

Function Documentation

◆ FormatMessageType()

std::string arrow::ipc::FormatMessageType ( Message::Type  type)

◆ GetRecordBatchSize()

Status arrow::ipc::GetRecordBatchSize ( const RecordBatch batch,
int64_t *  size 
)

Compute the number of bytes needed to write a record batch including metadata.

Parameters
[in]batchthe record batch to write
[out]sizethe size of the complete encapsulated message
Returns
Status

◆ GetTensorMessage()

Status arrow::ipc::GetTensorMessage ( const Tensor tensor,
MemoryPool pool,
std::unique_ptr< Message > *  out 
)

EXPERIMENTAL: Convert arrow::Tensor to a Message with minimal memory allocation.

Parameters
[in]tensorthe Tensor to write
[in]poolMemoryPool to allocate space for metadata
[out]outthe resulting Message
Returns
Status

◆ GetTensorSize()

Status arrow::ipc::GetTensorSize ( const Tensor tensor,
int64_t *  size 
)

Compute the number of bytes needed to write a tensor including metadata.

Parameters
[in]tensorthe tenseor to write
[out]sizethe size of the complete encapsulated message
Returns
Status

◆ ReadMessage() [1/3]

Status arrow::ipc::ReadMessage ( const int64_t  offset,
const int32_t  metadata_length,
io::RandomAccessFile file,
std::unique_ptr< Message > *  message 
)

Read encapsulated RPC message from position in file.

Read a length-prefixed message flatbuffer starting at the indicated file offset. If the message has a body with non-zero length, it will also be read

The metadata_length includes at least the length prefix and the flatbuffer

Parameters
[in]offsetthe position in the file where the message starts. The first 4 bytes after the offset are the message length
[in]metadata_lengththe total number of bytes to read from file
[in]filethe seekable file interface to read from
[out]messagethe message read
Returns
Status success or failure

◆ ReadMessage() [2/3]

Status arrow::ipc::ReadMessage ( io::InputStream stream,
bool  aligned,
std::unique_ptr< Message > *  message 
)

Read encapsulated RPC message (metadata and body) from InputStream.

Read length-prefixed message with as-yet unknown length. Returns null if there are not enough bytes available or the message length is 0 (e.g. EOS in a stream)

◆ ReadMessage() [3/3]

Status arrow::ipc::ReadMessage ( io::InputStream stream,
std::unique_ptr< Message > *  message 
)

Read encapsulated RPC message (metadata and body) from InputStream.

This is a version of ReadMessage that does not have the aligned argument for backwards compatibility.

◆ ReadRecordBatch() [1/4]

Status arrow::ipc::ReadRecordBatch ( const std::shared_ptr< Schema > &  schema,
io::InputStream stream,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch as encapsulated IPC message with metadata size prefix and header.

Parameters
[in]schemathe record batch schema
[in]streamthe file where the batch is located
[out]outthe read record batch
Returns
Status

◆ ReadRecordBatch() [2/4]

Status arrow::ipc::ReadRecordBatch ( const Buffer metadata,
const std::shared_ptr< Schema > &  schema,
io::RandomAccessFile file,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from file given metadata and schema.

Parameters
[in]metadataa Message containing the record batch metadata
[in]schemathe record batch schema
[in]filea random access file
[out]outthe read record batch
Returns
Status

◆ ReadRecordBatch() [3/4]

Status arrow::ipc::ReadRecordBatch ( const Message message,
const std::shared_ptr< Schema > &  schema,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from encapsulated Message.

Parameters
[in]messagea message instance containing metadata and body
[in]schemathe record batch schema
[out]outthe resulting RecordBatch
Returns
Status

◆ ReadRecordBatch() [4/4]

Status arrow::ipc::ReadRecordBatch ( const Buffer metadata,
const std::shared_ptr< Schema > &  schema,
int  max_recursion_depth,
io::RandomAccessFile file,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from file given metadata and schema.

Parameters
[in]metadataa Message containing the record batch metadata
[in]schemathe record batch schema
[in]filea random access file
[in]max_recursion_depththe maximum permitted nesting depth
[out]outthe read record batch
Returns
Status

◆ ReadSchema()

Status arrow::ipc::ReadSchema ( io::InputStream stream,
std::shared_ptr< Schema > *  out 
)

Read Schema from stream serialized as a sequence of one or more IPC messages.

Parameters
[in]streaman InputStream
[out]outthe output Schema
Returns
Status

If record batches follow the schema, it is better to use RecordBatchStreamReader

◆ ReadTensor() [1/2]

Status arrow::ipc::ReadTensor ( int64_t  offset,
io::RandomAccessFile file,
std::shared_ptr< Tensor > *  out 
)

EXPERIMENTAL: Read arrow::Tensor as encapsulated IPC message in file.

Parameters
[in]offsetthe file location of the start of the message
[in]filethe file where the batch is located
[out]outthe read tensor
Returns
Status

◆ ReadTensor() [2/2]

Status arrow::ipc::ReadTensor ( const Message message,
std::shared_ptr< Tensor > *  out 
)

EXPERIMENTAL: Read arrow::Tensor from IPC message.

Parameters
[in]messagea Message containing the tensor metadata and body
[out]outthe read tensor
Returns
Status

◆ SerializeRecordBatch() [1/2]

Status arrow::ipc::SerializeRecordBatch ( const RecordBatch batch,
MemoryPool pool,
std::shared_ptr< Buffer > *  out 
)

Serialize record batch as encapsulated IPC message in a new buffer.

Parameters
[in]batchthe record batch
[in]poola MemoryPool to allocate memory from
[out]outthe serialized message
Returns
Status

◆ SerializeRecordBatch() [2/2]

Status arrow::ipc::SerializeRecordBatch ( const RecordBatch batch,
MemoryPool pool,
io::OutputStream out 
)

Write record batch to OutputStream.

Parameters
[in]batchthe record batch to write
[in]poola MemoryPool to use for temporary allocations, if needed
[in]outthe OutputStream to write the output to
Returns
Status

If writing to pre-allocated memory, you can use arrow::ipc::GetRecordBatchSize to compute how much space is required

◆ SerializeSchema()

Status arrow::ipc::SerializeSchema ( const Schema schema,
MemoryPool pool,
std::shared_ptr< Buffer > *  out 
)

Serialize schema using stream writer as a sequence of one or more IPC messages.

Parameters
[in]schemathe schema to write
[in]poola MemoryPool to allocate memory from
[out]outthe serialized schema
Returns
Status

◆ WriteRecordBatch()

Status arrow::ipc::WriteRecordBatch ( const RecordBatch batch,
int64_t  buffer_start_offset,
io::OutputStream dst,
int32_t *  metadata_length,
int64_t *  body_length,
MemoryPool pool,
int  max_recursion_depth = kMaxNestingDepth,
bool  allow_64bit = false 
)

Low-level API for writing a record batch (without schema) to an OutputStream.

Parameters
[in]batchthe record batch to write
[in]buffer_start_offsetthe start offset to use in the buffer metadata, generally should be 0
[in]dstan OutputStream
[out]metadata_lengththe size of the length-prefixed flatbuffer including padding to a 64-byte boundary
[out]body_lengththe size of the contiguous buffer block plus
[in]max_recursion_depththe maximum permitted nesting schema depth
[in]allow_64bitpermit field lengths exceeding INT32_MAX. May not be readable by other Arrow implementations padding bytes
Returns
Status

Write the RecordBatch (collection of equal-length Arrow arrays) to the output stream in a contiguous block. The record batch metadata is written as a flatbuffer (see format/Message.fbs – the RecordBatch message type) prefixed by its size, followed by each of the memory buffers in the batch written end to end (with appropriate alignment and padding):

<int32: metadata size> <uint8*: metadata> <buffers>

Finally, the absolute offsets (relative to the start of the output stream) to the end of the body and end of the metadata / data header (suffixed by the header size) is returned in out-variables

◆ WriteRecordBatchStream()

Status arrow::ipc::WriteRecordBatchStream ( const std::vector< std::shared_ptr< RecordBatch >> &  batches,
io::OutputStream dst 
)

Write multiple record batches to OutputStream, including schema.

Parameters
[in]batchesa vector of batches. Must all have same schema
[out]dstan OutputStream
Returns
Status

◆ WriteTensor()

Status arrow::ipc::WriteTensor ( const Tensor tensor,
io::OutputStream dst,
int32_t *  metadata_length,
int64_t *  body_length 
)

EXPERIMENTAL: Write arrow::Tensor as a contiguous message.

Parameters
[in]tensorthe Tensor to write
[in]dstthe OutputStream to write to
[out]metadata_lengththe actual metadata length
[out]body_lengththe acutal message body length
Returns
Status

<metadata size>=""><metadata><tensor data>="">

Variable Documentation

◆ kMaxNestingDepth

constexpr int arrow::ipc::kMaxNestingDepth = 64