pyarrow.ipc.RecordBatchFileReader

class pyarrow.ipc.RecordBatchFileReader(source, footer_offset=None, *, options=None, memory_pool=None)[source]

Bases: _RecordBatchFileReader

Class for reading Arrow record batch data from the Arrow binary file format

Parameters:
sourcebytes/buffer-like, pyarrow.NativeFile, or file-like Python object

Either an in-memory buffer, or a readable file object. If you want to use memory map use MemoryMappedFile as source.

footer_offsetint, default None

If the file is embedded in some larger file, this is the byte offset to the very end of the file data

optionspyarrow.ipc.IpcReadOptions

Options for IPC serialization. If None, default values will be used.

memory_poolMemoryPool, default None

If None, default memory pool is used.

__init__(source, footer_offset=None, *, options=None, memory_pool=None)[source]

Methods

__init__(source[, footer_offset, options, ...])

get_batch(self, int i)

Read the record batch with the given index.

get_batch_with_custom_metadata(self, int i)

Read the record batch with the given index along with its custom metadata

get_record_batch(self, int i)

Read the record batch with the given index.

read_all(self)

Read all record batches as a pyarrow.Table

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Attributes

num_record_batches

The number of record batches in the IPC file.

schema

stats

Current IPC read statistics.

get_batch(self, int i)

Read the record batch with the given index.

Parameters:
iint

The index of the record batch in the IPC file.

Returns:
batchRecordBatch
get_batch_with_custom_metadata(self, int i)

Read the record batch with the given index along with its custom metadata

Parameters:
iint

The index of the record batch in the IPC file.

Returns:
batchRecordBatch
custom_metadataKeyValueMetadata
get_record_batch(self, int i)

Read the record batch with the given index.

Parameters:
iint

The index of the record batch in the IPC file.

Returns:
batchRecordBatch
num_record_batches

The number of record batches in the IPC file.

read_all(self)

Read all record batches as a pyarrow.Table

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters:
**options

Arguments to forward to Table.to_pandas().

Returns:
dfpandas.DataFrame
schema
stats

Current IPC read statistics.