pyarrow.RecordBatchReader¶
- class pyarrow.RecordBatchReader¶
Bases:
_Weakrefable
Base class for reading stream of record batches.
Record batch readers function as iterators of record batches that also provide the schema (without the need to get any batches).
Warning
Do not call this class’s constructor directly, use one of the
RecordBatchReader.from_*
functions instead.Notes
To import and export using the Arrow C stream interface, use the
_import_from_c
and_export_from_c
methods. However, keep in mind this interface is intended for expert users.Examples
>>> import pyarrow as pa >>> schema = pa.schema([('x', pa.int64())]) >>> def iter_record_batches(): ... for i in range(2): ... yield pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], schema=schema) >>> reader = pa.RecordBatchReader.from_batches(schema, iter_record_batches()) >>> print(reader.schema) x: int64 >>> for batch in reader: ... print(batch) pyarrow.RecordBatch x: int64 pyarrow.RecordBatch x: int64
- __init__(*args, **kwargs)¶
Methods
__init__
(*args, **kwargs)close
(self)Release any resources associated with the reader.
from_batches
(schema, batches)Create RecordBatchReader from an iterable of batches.
get_next_batch
(self)DEPRECATED: return the next record batch.
read_all
(self)Read all record batches as a pyarrow.Table.
read_next_batch
(self)Read next RecordBatch from the stream.
read_pandas
(self, **options)Read contents of stream to a pandas.DataFrame.
Attributes
Shared schema of the record batches in the stream.
- close(self)¶
Release any resources associated with the reader.
- static from_batches(schema, batches)¶
Create RecordBatchReader from an iterable of batches.
- Parameters:
- schema
Schema
The shared schema of the record batches
- batches
Iterable
[RecordBatch
] The batches that this reader will return.
- schema
- Returns:
- readerRecordBatchReader
- get_next_batch(self)¶
DEPRECATED: return the next record batch.
Use read_next_batch instead.
- read_next_batch(self)¶
Read next RecordBatch from the stream.
- Returns:
- Raises:
- StopIteration:
At end of stream.
- read_pandas(self, **options)¶
Read contents of stream to a pandas.DataFrame.
Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.
- Parameters:
- **options
Arguments to forward to
Table.to_pandas()
.
- Returns: