Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.
Namespaces | Classes | Typedefs | Enumerations | Functions | Variables
arrow::ipc Namespace Reference

Namespaces

 feather
 
 json
 

Classes

class  DictionaryMemo
 Memoization data structure for handling shared dictionaries. More...
 
class  InputStreamMessageReader
 Implementation of MessageReader that reads from InputStream. More...
 
class  JsonReader
 Read the JSON representation of an Arrow record batch file or stream. More...
 
class  JsonWriter
 Write the JSON representation of an Arrow record batch file or stream. More...
 
class  Message
 An IPC message including metadata and body. More...
 
class  MessageReader
 Abstract interface for a sequence of messages. More...
 
class  RecordBatchFileReader
 Reads the record batch file format. More...
 
class  RecordBatchFileWriter
 Creates the Arrow record batch file format. More...
 
class  RecordBatchStreamReader
 Synchronous batch stream reader that reads from io::InputStream. More...
 
class  RecordBatchStreamWriter
 Synchronous batch stream writer that writes the Arrow streaming format. More...
 
class  RecordBatchWriter
 Abstract interface for writing a stream of record batches. More...
 

Typedefs

using DictionaryMap = std::unordered_map< int64_t, std::shared_ptr< Array > >
 
using DictionaryTypeMap = std::unordered_map< int64_t, std::shared_ptr< Field > >
 
using RecordBatchReader = ::arrow::RecordBatchReader
 

Enumerations

enum  MetadataVersion : char { MetadataVersion::V1, MetadataVersion::V2, MetadataVersion::V3 }
 

Functions

std::string FormatMessageType (Message::Type type)
 
Status ReadMessage (const int64_t offset, const int32_t metadata_length, io::RandomAccessFile *file, std::unique_ptr< Message > *message)
 Read encapulated RPC message from position in file. More...
 
Status ReadMessage (io::InputStream *stream, std::unique_ptr< Message > *message)
 Read encapulated RPC message (metadata and body) from InputStream. More...
 
Status ReadSchema (io::InputStream *stream, std::shared_ptr< Schema > *out)
 Read Schema from stream serialized as a sequence of one or more IPC messages. More...
 
Status ReadRecordBatch (const std::shared_ptr< Schema > &schema, io::InputStream *stream, std::shared_ptr< RecordBatch > *out)
 Read record batch as encapsulated IPC message with metadata size prefix and header. More...
 
Status ReadRecordBatch (const Buffer &metadata, const std::shared_ptr< Schema > &schema, io::RandomAccessFile *file, std::shared_ptr< RecordBatch > *out)
 Read record batch from file given metadata and schema. More...
 
Status ReadRecordBatch (const Message &message, const std::shared_ptr< Schema > &schema, std::shared_ptr< RecordBatch > *out)
 Read record batch from encapulated Message. More...
 
Status ReadRecordBatch (const Buffer &metadata, const std::shared_ptr< Schema > &schema, int max_recursion_depth, io::RandomAccessFile *file, std::shared_ptr< RecordBatch > *out)
 Read record batch from file given metadata and schema. More...
 
Status ReadTensor (int64_t offset, io::RandomAccessFile *file, std::shared_ptr< Tensor > *out)
 EXPERIMENTAL: Read arrow::Tensor as encapsulated IPC message in file. More...
 
Status WriteRecordBatch (const RecordBatch &batch, int64_t buffer_start_offset, io::OutputStream *dst, int32_t *metadata_length, int64_t *body_length, MemoryPool *pool, int max_recursion_depth=kMaxNestingDepth, bool allow_64bit=false)
 Low-level API for writing a record batch (without schema) to an OutputStream. More...
 
Status SerializeRecordBatch (const RecordBatch &batch, MemoryPool *pool, std::shared_ptr< Buffer > *out)
 Serialize record batch as encapsulated IPC message in a new buffer. More...
 
Status SerializeRecordBatch (const RecordBatch &batch, MemoryPool *pool, io::OutputStream *out)
 Write record batch to OutputStream. More...
 
Status SerializeSchema (const Schema &schema, MemoryPool *pool, std::shared_ptr< Buffer > *out)
 Serialize schema using stream writer as a sequence of one or more IPC messages. More...
 
Status WriteRecordBatchStream (const std::vector< std::shared_ptr< RecordBatch >> &batches, io::OutputStream *dst)
 Write multiple record batches to OutputStream, including schema. More...
 
Status GetRecordBatchSize (const RecordBatch &batch, int64_t *size)
 Compute the number of bytes needed to write a record batch including metadata. More...
 
Status GetTensorSize (const Tensor &tensor, int64_t *size)
 Compute the number of bytes needed to write a tensor including metadata. More...
 
Status WriteTensor (const Tensor &tensor, io::OutputStream *dst, int32_t *metadata_length, int64_t *body_length)
 EXPERIMENTAL: Write arrow::Tensor as a contiguous message. More...
 

Variables

constexpr int kMaxNestingDepth = 64
 

Typedef Documentation

◆ DictionaryMap

using arrow::ipc::DictionaryMap = typedef std::unordered_map<int64_t, std::shared_ptr<Array> >

◆ DictionaryTypeMap

using arrow::ipc::DictionaryTypeMap = typedef std::unordered_map<int64_t, std::shared_ptr<Field> >

◆ RecordBatchReader

Enumeration Type Documentation

◆ MetadataVersion

enum arrow::ipc::MetadataVersion : char
strong
Enumerator
V1 
V2 
V3 

Function Documentation

◆ FormatMessageType()

std::string arrow::ipc::FormatMessageType ( Message::Type  type)

◆ GetRecordBatchSize()

Status arrow::ipc::GetRecordBatchSize ( const RecordBatch batch,
int64_t *  size 
)

Compute the number of bytes needed to write a record batch including metadata.

Parameters
[in]batchthe record batch to write
[out]sizethe size of the complete encapsulated message
Returns
Status

◆ GetTensorSize()

Status arrow::ipc::GetTensorSize ( const Tensor tensor,
int64_t *  size 
)

Compute the number of bytes needed to write a tensor including metadata.

Parameters
[in]tensorthe tenseor to write
[out]sizethe size of the complete encapsulated message
Returns
Status

◆ ReadMessage() [1/2]

Status arrow::ipc::ReadMessage ( const int64_t  offset,
const int32_t  metadata_length,
io::RandomAccessFile file,
std::unique_ptr< Message > *  message 
)

Read encapulated RPC message from position in file.

Read a length-prefixed message flatbuffer starting at the indicated file offset. If the message has a body with non-zero length, it will also be read

The metadata_length includes at least the length prefix and the flatbuffer

Parameters
[in]offsetthe position in the file where the message starts. The first 4 bytes after the offset are the message length
[in]metadata_lengththe total number of bytes to read from file
[in]filethe seekable file interface to read from
[out]messagethe message read
Returns
Status success or failure

◆ ReadMessage() [2/2]

Status arrow::ipc::ReadMessage ( io::InputStream stream,
std::unique_ptr< Message > *  message 
)

Read encapulated RPC message (metadata and body) from InputStream.

Read length-prefixed message with as-yet unknown length. Returns null if there are not enough bytes available or the message length is 0 (e.g. EOS in a stream)

◆ ReadRecordBatch() [1/4]

Status arrow::ipc::ReadRecordBatch ( const std::shared_ptr< Schema > &  schema,
io::InputStream stream,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch as encapsulated IPC message with metadata size prefix and header.

Parameters
[in]schemathe record batch schema
[in]streamthe file where the batch is located
[out]outthe read record batch
Returns
Status

◆ ReadRecordBatch() [2/4]

Status arrow::ipc::ReadRecordBatch ( const Buffer metadata,
const std::shared_ptr< Schema > &  schema,
io::RandomAccessFile file,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from file given metadata and schema.

Parameters
[in]metadataa Message containing the record batch metadata
[in]schemathe record batch schema
[in]filea random access file
[out]outthe read record batch
Returns
Status

◆ ReadRecordBatch() [3/4]

Status arrow::ipc::ReadRecordBatch ( const Message message,
const std::shared_ptr< Schema > &  schema,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from encapulated Message.

Parameters
[in]messagea message instance containing metadata and body
[in]schemathe record batch schema
[out]outthe resulting RecordBatch
Returns
Status

◆ ReadRecordBatch() [4/4]

Status arrow::ipc::ReadRecordBatch ( const Buffer metadata,
const std::shared_ptr< Schema > &  schema,
int  max_recursion_depth,
io::RandomAccessFile file,
std::shared_ptr< RecordBatch > *  out 
)

Read record batch from file given metadata and schema.

Parameters
[in]metadataa Message containing the record batch metadata
[in]schemathe record batch schema
[in]filea random access file
[in]max_recursion_depththe maximum permitted nesting depth
[out]outthe read record batch
Returns
Status

◆ ReadSchema()

Status arrow::ipc::ReadSchema ( io::InputStream stream,
std::shared_ptr< Schema > *  out 
)

Read Schema from stream serialized as a sequence of one or more IPC messages.

Parameters
[in]streaman InputStream
[out]outthe output Schema
Returns
Status

If record batches follow the schema, it is better to use RecordBatchStreamReader

◆ ReadTensor()

Status arrow::ipc::ReadTensor ( int64_t  offset,
io::RandomAccessFile file,
std::shared_ptr< Tensor > *  out 
)

EXPERIMENTAL: Read arrow::Tensor as encapsulated IPC message in file.

Parameters
[in]offsetthe file location of the start of the message
[in]filethe file where the batch is located
[out]outthe read tensor
Returns
Status

◆ SerializeRecordBatch() [1/2]

Status arrow::ipc::SerializeRecordBatch ( const RecordBatch batch,
MemoryPool pool,
std::shared_ptr< Buffer > *  out 
)

Serialize record batch as encapsulated IPC message in a new buffer.

Parameters
[in]batchthe record batch
[in]poola MemoryPool to allocate memory from
[out]outthe serialized message
Returns
Status

◆ SerializeRecordBatch() [2/2]

Status arrow::ipc::SerializeRecordBatch ( const RecordBatch batch,
MemoryPool pool,
io::OutputStream out 
)

Write record batch to OutputStream.

Parameters
[in]batchthe record batch to write
[in]poola MemoryPool to use for temporary allocations, if needed
[in]outthe OutputStream to write the output to
Returns
Status

If writing to pre-allocated memory, you can use arrow::ipc::GetRecordBatchSize to compute how much space is required

◆ SerializeSchema()

Status arrow::ipc::SerializeSchema ( const Schema schema,
MemoryPool pool,
std::shared_ptr< Buffer > *  out 
)

Serialize schema using stream writer as a sequence of one or more IPC messages.

Parameters
[in]schemathe schema to write
[in]poola MemoryPool to allocate memory from
[out]outthe serialized schema
Returns
Status

◆ WriteRecordBatch()

Status arrow::ipc::WriteRecordBatch ( const RecordBatch batch,
int64_t  buffer_start_offset,
io::OutputStream dst,
int32_t *  metadata_length,
int64_t *  body_length,
MemoryPool pool,
int  max_recursion_depth = kMaxNestingDepth,
bool  allow_64bit = false 
)

Low-level API for writing a record batch (without schema) to an OutputStream.

Parameters
[in]batchthe record batch to write
[in]buffer_start_offsetthe start offset to use in the buffer metadata, generally should be 0
[in]dstan OutputStream
[out]metadata_lengththe size of the length-prefixed flatbuffer including padding to a 64-byte boundary
[out]body_lengththe size of the contiguous buffer block plus
[in]max_recursion_depththe maximum permitted nesting schema depth
[in]allow_64bitpermit field lengths exceeding INT32_MAX. May not be readable by other Arrow implementations padding bytes
Returns
Status

Write the RecordBatch (collection of equal-length Arrow arrays) to the output stream in a contiguous block. The record batch metadata is written as a flatbuffer (see format/Message.fbs – the RecordBatch message type) prefixed by its size, followed by each of the memory buffers in the batch written end to end (with appropriate alignment and padding):

<int32: metadata size> <uint8*: metadata> <buffers>

Finally, the absolute offsets (relative to the start of the output stream) to the end of the body and end of the metadata / data header (suffixed by the header size) is returned in out-variables

◆ WriteRecordBatchStream()

Status arrow::ipc::WriteRecordBatchStream ( const std::vector< std::shared_ptr< RecordBatch >> &  batches,
io::OutputStream dst 
)

Write multiple record batches to OutputStream, including schema.

Parameters
[in]batchesa vector of batches. Must all have same schema
[out]dstan OutputStream
Returns
Status

◆ WriteTensor()

Status arrow::ipc::WriteTensor ( const Tensor tensor,
io::OutputStream dst,
int32_t *  metadata_length,
int64_t *  body_length 
)

EXPERIMENTAL: Write arrow::Tensor as a contiguous message.

Parameters
[in]tensorthe Tensor to write
[in]dstthe OutputStream to write to
[out]metadata_lengththe actual metadata length
[out]body_lengththe acutal message body length
Returns
Status

<metadata size>=""><metadata><tensor data>="">

Variable Documentation

◆ kMaxNestingDepth

constexpr int arrow::ipc::kMaxNestingDepth = 64