pyarrow.ipc.RecordBatchFileReader#

class pyarrow.ipc.RecordBatchFileReader(source, footer_offset=None, *, options=None, memory_pool=None)[source]#

Bases: pyarrow.lib._RecordBatchFileReader

Class for reading Arrow record batch data from the Arrow binary file format

Parameters
sourcebytes/buffer-like, pyarrow.NativeFile, or file-like Python object

Either an in-memory buffer, or a readable file object

footer_offsetint, default None

If the file is embedded in some larger file, this is the byte offset to the very end of the file data

optionspyarrow.ipc.IpcReadOptions

Options for IPC serialization. If None, default values will be used.

memory_poolMemoryPool, default None

If None, default memory pool is used.

__init__(source, footer_offset=None, *, options=None, memory_pool=None)[source]#

Methods

__init__(source[, footer_offset, options, ...])

get_batch(self, int i)

Read the record batch with the given index.

get_record_batch(self, int i)

Read the record batch with the given index.

read_all(self)

Read all record batches as a pyarrow.Table

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Attributes

num_record_batches

The number of record batches in the IPC file.

schema

stats

Current IPC read statistics.

get_batch(self, int i)#

Read the record batch with the given index.

Parameters
iint

The index of the record batch in the IPC file.

Returns
batchRecordBatch
get_record_batch(self, int i)#

Read the record batch with the given index.

Parameters
iint

The index of the record batch in the IPC file.

Returns
batchRecordBatch
num_record_batches#

The number of record batches in the IPC file.

read_all(self)#

Read all record batches as a pyarrow.Table

read_pandas(self, **options)#

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters
**options

Arguments to forward to Table.to_pandas().

Returns
dfpandas.DataFrame
schema#
stats#

Current IPC read statistics.