Reading and writing the Arrow IPC format#

Arrow C++ provides readers and writers for the Arrow IPC format which wrap lower level input/output, handled through the IO interfaces. For reading, there is also an event-driven API that enables feeding arbitrary data into the IPC decoding layer asynchronously.

Reading IPC streams and files#

Synchronous reading#

For most cases, it is most convenient to use the RecordBatchStreamReader or RecordBatchFileReader class, depending on which variant of the IPC format you want to read. The former requires a InputStream source, while the latter requires a RandomAccessFile.

Reading Arrow IPC data is inherently zero-copy if the source allows it. For example, a BufferReader or MemoryMappedFile can typically be zero-copy. Exceptions are when the data must be transformed on the fly, e.g. when buffer compression has been enabled on the IPC stream or file.

Event-driven reading#

When it is necessary to process the IPC format without blocking (for example to integrate Arrow with an event loop), or if data is coming from an unusual source, use the event-driven StreamDecoder. You will need to define a subclass of Listener and implement the virtual methods for the desired events (for example, implement Listener::OnRecordBatchDecoded() to be notified of each incoming RecordBatch).

Writing IPC streams and files#

Use one of the factory functions, MakeStreamWriter() or MakeFileWriter(), to obtain a RecordBatchWriter instance for the given IPC format variant.

Configuring#

Various aspects of reading and writing the IPC format can be configured using the IpcReadOptions and IpcWriteOptions classes, respectively.