pyarrow.ipc.RecordBatchStreamReader#

class pyarrow.ipc.RecordBatchStreamReader(source, *, options=None, memory_pool=None)[source]#

Bases: _RecordBatchStreamReader

Reader for the Arrow streaming binary format.

Parameters:

sourcebytes/buffer-like, pyarrow.NativeFile, or file-like Python object: Either an in-memory buffer, or a readable file object. If you want to use memory map use MemoryMappedFile as source.
optionspyarrow.ipc.IpcReadOptions: Options for IPC deserialization. If None, default values will be used.
memory_poolMemoryPool, default None: If None, default memory pool is used.

Methods

`__init__`(source, *[, options, memory_pool])
`cast`(self, target_schema)	Wrap this reader with one that casts each batch lazily as it is pulled.
`close`(self)	Release any resources associated with the reader.
`from_batches`(Schema schema, batches)	Create RecordBatchReader from an iterable of batches.
`from_stream`(data[, schema])	Create RecordBatchReader from a Arrow-compatible stream object.
`iter_batches_with_custom_metadata`(self)	Iterate over record batches from the stream along with their custom metadata.
`read_all`(self)	Read all record batches as a pyarrow.Table.
`read_next_batch`(self)	Read next RecordBatch from the stream.
`read_next_batch_with_custom_metadata`(self)	Read next RecordBatch from the stream along with its custom metadata.
`read_pandas`(self, **options)	Read contents of stream to a pandas.DataFrame.

Attributes

`schema`	Shared schema of the record batches in the stream.
`stats`	Current IPC read statistics.

cast(self, target_schema)#

Wrap this reader with one that casts each batch lazily as it is pulled. Currently only a safe cast to target_schema is implemented.

Parameters:

target_schemaSchema: Schema to cast to, the names and order of fields must match.

Returns:

static from_batches(Schema schema, batches)#

Create RecordBatchReader from an iterable of batches.

Parameters:

Returns:

static from_stream(data, schema=None)#

Create RecordBatchReader from a Arrow-compatible stream object.

This accepts objects implementing the Arrow PyCapsule Protocol for streams, i.e. objects that have a __arrow_c_stream__ method.

Parameters:

dataArrow-compatible stream object: Any object that implements the Arrow PyCapsule Protocol for streams.
schemaSchema, default None: The schema to which the stream should be casted, if supported by the stream object.

Returns:

iter_batches_with_custom_metadata(self)#

Iterate over record batches from the stream along with their custom metadata.

Yields:

read_all(self)#

Read all record batches as a pyarrow.Table.

Returns:

read_next_batch(self)#