Expand description
Arrow IPC File and Stream Readers
ยงNotes
The FileReader
and StreamReader
have similar interfaces,
however the FileReader
expects a reader that supports Seek
ing
Modulesยง
- stream ๐
Structsยง
- File
Decoder - A low-level, push-based interface for reading an IPC file
- File
Reader - Arrow File Reader
- File
Reader Builder - Build an Arrow
FileReader
with custom options. - Message
Reader ๐ - A low-level construct that reads
Message::Message
s from a reader while re-using a buffer for metadata. This is composed intoStreamReader
. - Record
Batch Decoder - State for decoding Arrow arrays from an IPC RecordBatch structure to
[
RecordBatch
] - Stream
Decoder - A low-level interface for reading [
RecordBatch
] data from a stream of bytes - Stream
Reader - Arrow Stream Reader
Enumsยง
- IpcMessage ๐
- Representation of a fully parsed IpcMessage from the underlying stream.
Parsing this kind of message is done by higher level constructs such as
StreamReader
, because fully interpreting the messages into a record batch or dictionary batch requires access to stream state such as schema and the full dictionary cache.
Functionsยง
- get_
dictionary_ ๐values - Given a dictionary batch IPC message/body along with the full state of a stream including schema, dictionary cache, metadata, and other flags, this function will parse the buffer into an array of dictionary values.
- parse_
message ๐ - Parse an encapsulated message
- read_
block ๐ - Read the data for a given block
- read_
buffer ๐ - Read a buffer based on offset and length From https://github.com/apache/arrow/blob/6a936c4ff5007045e86f65f1a6b6c3c955ad5103/format/Message.fbs#L58 Each constituent buffer is first compressed with the indicated compressor, and then written with the uncompressed length in the first 8 bytes as a 64-bit little-endian signed integer followed by the compressed buffer bytes (and then padding as required by the protocol). The uncompressed length may be set to -1 to indicate that the data that follows is not compressed, which can be useful for cases where compression does not yield appreciable savings.
- read_
dictionary - Read the dictionary from the buffer and provided metadata,
updating the
dictionaries_by_id
with the resulting dictionary - read_
dictionary_ ๐impl - read_
footer_ length - Read the footer length from the last 10 bytes of an Arrow IPC file
- read_
record_ batch - Creates a record batch from binary data using the
crate::RecordBatch
indexes and theSchema
. - update_
dictionaries ๐ - Updates the
dictionaries_by_id
with the provided dictionary values and id.