Reading and writing the Arrow IPC format#
Arrow C++ provides readers and writers for the Arrow IPC format which wrap lower level input/output, handled through the IO interfaces. For reading, there is also an event-driven API that enables feeding arbitrary data into the IPC decoding layer asynchronously.
Reading IPC streams and files#
Synchronous reading#
For most cases, it is most convenient to use the RecordBatchStreamReader
or RecordBatchFileReader
class, depending on which variant of the IPC
format you want to read. The former requires a InputStream
source, while the latter requires a RandomAccessFile
.
Reading Arrow IPC data is inherently zero-copy if the source allows it.
For example, a BufferReader
or MemoryMappedFile
can typically be zero-copy. Exceptions are when the data must be transformed
on the fly, e.g. when buffer compression has been enabled on the IPC stream
or file.
Event-driven reading#
When it is necessary to process the IPC format without blocking (for example
to integrate Arrow with an event loop), or if data is coming from an unusual
source, use the event-driven StreamDecoder
. You will need to define
a subclass of Listener
and implement the virtual methods for the
desired events (for example, implement Listener::OnRecordBatchDecoded()
to be notified of each incoming RecordBatch
).
Writing IPC streams and files#
Use one of the factory functions, MakeStreamWriter()
or
MakeFileWriter()
, to obtain a RecordBatchWriter
instance for
the given IPC format variant.
Configuring#
Various aspects of reading and writing the IPC format can be configured
using the IpcReadOptions
and IpcWriteOptions
classes,
respectively.