pyarrow.ipc.RecordBatchFileReader#

class pyarrow.ipc.RecordBatchFileReader(source, footer_offset=None, *, options=None, memory_pool=None)[source]#

Bases: pyarrow.lib._RecordBatchFileReader

Class for reading Arrow record batch data from the Arrow binary file format

Parameters

sourcebytes/buffer-like, pyarrow.NativeFile, or file-like Python object: Either an in-memory buffer, or a readable file object
footer_offsetint, default None: If the file is embedded in some larger file, this is the byte offset to the very end of the file data
optionspyarrow.ipc.IpcReadOptions: Options for IPC serialization. If None, default values will be used.
memory_poolMemoryPool, default None: If None, default memory pool is used.

__init__(source, footer_offset=None, *, options=None, memory_pool=None)[source]#

Methods

`__init__`(source[, footer_offset, options, ...])
`get_batch`(self, int i)	Read the record batch with the given index.
`get_record_batch`(self, int i)	Read the record batch with the given index.
`read_all`(self)	Read all record batches as a pyarrow.Table
`read_pandas`(self, **options)	Read contents of stream to a pandas.DataFrame.

Attributes

`num_record_batches`	The number of record batches in the IPC file.
`schema`
`stats`	Current IPC read statistics.

get_batch(self, int i)#

Read the record batch with the given index.

Parameters

Returns

get_record_batch(self, int i)#

Read the record batch with the given index.

Parameters

Returns

read_pandas(self, **options)#

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters

Returns

pyarrow.ipc.MessageReader

pyarrow.ipc.RecordBatchFileWriter