pyarrow.ipc.RecordBatchFileReader#
- class pyarrow.ipc.RecordBatchFileReader(source, footer_offset=None, *, options=None, memory_pool=None)[source]#
Bases:
_RecordBatchFileReader
Class for reading Arrow record batch data from the Arrow binary file format
- Parameters:
- sourcebytes/buffer-like,
pyarrow.NativeFile
, or file-like Python object Either an in-memory buffer, or a readable file object. If you want to use memory map use MemoryMappedFile as source.
- footer_offset
int
, defaultNone
If the file is embedded in some larger file, this is the byte offset to the very end of the file data
- options
pyarrow.ipc.IpcReadOptions
Options for IPC serialization. If None, default values will be used.
- memory_pool
MemoryPool
, defaultNone
If None, default memory pool is used.
- sourcebytes/buffer-like,
Methods
__init__
(source[, footer_offset, options, ...])get_batch
(self, int i)Read the record batch with the given index.
get_batch_with_custom_metadata
(self, int i)Read the record batch with the given index along with its custom metadata
get_record_batch
(self, int i)Read the record batch with the given index.
read_all
(self)Read all record batches as a pyarrow.Table
read_pandas
(self, **options)Read contents of stream to a pandas.DataFrame.
Attributes
The number of record batches in the IPC file.
Current IPC read statistics.
- get_batch(self, int i)#
Read the record batch with the given index.
- Parameters:
- i
int
The index of the record batch in the IPC file.
- i
- Returns:
- batch
RecordBatch
- batch
- get_batch_with_custom_metadata(self, int i)#
Read the record batch with the given index along with its custom metadata
- Parameters:
- i
int
The index of the record batch in the IPC file.
- i
- Returns:
- batch
RecordBatch
- custom_metadata
KeyValueMetadata
- batch
- get_record_batch(self, int i)#
Read the record batch with the given index.
- Parameters:
- i
int
The index of the record batch in the IPC file.
- i
- Returns:
- batch
RecordBatch
- batch
- num_record_batches#
The number of record batches in the IPC file.
- read_all(self)#
Read all record batches as a pyarrow.Table
- read_pandas(self, **options)#
Read contents of stream to a pandas.DataFrame.
Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.
- Parameters:
- **options
Arguments to forward to
Table.to_pandas()
.
- Returns:
- schema#
- stats#
Current IPC read statistics.