pyarrow.RecordBatchReader#

class pyarrow.RecordBatchReader#

Bases: pyarrow.lib._Weakrefable

Base class for reading stream of record batches.

Record batch readers function as iterators of record batches that also provide the schema (without the need to get any batches).

Warning

Do not call this class’s constructor directly, use one of the RecordBatchReader.from_* functions instead.

Notes

To import and export using the Arrow C stream interface, use the _import_from_c and _export_from_c methods. However, keep in mind this interface is experimental and intended for expert users.

Examples

>>> schema = pa.schema([('x', pa.int64())])
>>> def iter_record_batches():
...     for i in range(2):
...     yield pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], schema=schema)
>>> reader = pa.RecordBatchReader.from_batches(schema, iter_record_batches())
>>> print(reader.schema)
pyarrow.Schema
x: int64
>>> for batch in reader:
...     print(batch)
pyarrow.RecordBatch
x: int64
pyarrow.RecordBatch
x: int64
__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

from_batches(schema, batches)

Create RecordBatchReader from an iterable of batches.

get_next_batch(self)

DEPRECATED: return the next record batch.

read_all(self)

Read all record batches as a pyarrow.Table.

read_next_batch(self)

Read next RecordBatch from the stream.

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Attributes

schema

Shared schema of the record batches in the stream.

static from_batches(schema, batches)#

Create RecordBatchReader from an iterable of batches.

Parameters
schemaSchema

The shared schema of the record batches

batchesIterable[RecordBatch]

The batches that this reader will return.

Returns
readerRecordBatchReader
get_next_batch(self)#

DEPRECATED: return the next record batch.

Use read_next_batch instead.

read_all(self)#

Read all record batches as a pyarrow.Table.

Returns
Table
read_next_batch(self)#

Read next RecordBatch from the stream.

Returns
RecordBatch
Raises
StopIteration:

At end of stream.

read_pandas(self, **options)#

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters
**options

Arguments to forward to Table.to_pandas().

Returns
dfpandas.DataFrame
schema#

Shared schema of the record batches in the stream.

Returns
Schema