pyarrow.RecordBatch

class pyarrow.RecordBatch

Bases: object

Batch of rows of columns of equal length

Warning

Do not call this class’s constructor directly, use one of the from_* methods instead.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

column(self, i) Select single column from record batcha
equals(self, RecordBatch other)
from_arrays(list arrays, list names, …) Construct a RecordBatch from multiple pyarrow.Arrays
from_pandas(type cls, df, …[, nthreads]) Convert pandas.DataFrame to an Arrow RecordBatch
replace_schema_metadata(self, dict metadata=None) EXPERIMENTAL: Create shallow copy of record batch by replacing schema
serialize(self[, memory_pool]) Write RecordBatch to Buffer as encapsulated IPC message
slice(self[, offset, length]) Compute zero-copy slice of this RecordBatch
to_pandas(self[, nthreads]) Convert the arrow::RecordBatch to a pandas DataFrame
to_pydict(self) Converted the arrow::RecordBatch to an OrderedDict
column(self, i)

Select single column from record batcha

Returns:column (pyarrow.Array)
equals(self, RecordBatch other)
static from_arrays(list arrays, list names, dict metadata=None)

Construct a RecordBatch from multiple pyarrow.Arrays

Parameters:
  • arrays (list of pyarrow.Array) – column-wise data vectors
  • names (list of str) – Labels for the columns
Returns:

pyarrow.RecordBatch

from_pandas(type cls, df, Schema schema=None, bool preserve_index=True, nthreads=None)

Convert pandas.DataFrame to an Arrow RecordBatch

Parameters:
  • df (pandas.DataFrame) –
  • schema (pyarrow.Schema, optional) – The expected schema of the RecordBatch. This can be used to indicate the type of columns if we cannot infer it automatically.
  • preserve_index (bool, optional) – Whether to store the index as an additional column in the resulting RecordBatch.
  • nthreads (int, default None (may use up to system CPU count threads)) – If greater than 1, convert columns to Arrow in parallel using indicated number of threads
Returns:

pyarrow.RecordBatch

num_columns

Number of columns

Returns:int
num_rows

Number of rows

Due to the definition of a RecordBatch, all columns have the same number of rows.

Returns:int
replace_schema_metadata(self, dict metadata=None)

EXPERIMENTAL: Create shallow copy of record batch by replacing schema key-value metadata with the indicated new metadata (which may be None, which deletes any existing metadata

Parameters:metadata (dict, default None) –
Returns:shallow_copy (RecordBatch)
schema

Schema of the RecordBatch and its columns

Returns:pyarrow.Schema
serialize(self, memory_pool=None)

Write RecordBatch to Buffer as encapsulated IPC message

Parameters:memory_pool (MemoryPool, default None) – Uses default memory pool if not specified
Returns:serialized (Buffer)
slice(self, offset=0, length=None)

Compute zero-copy slice of this RecordBatch

Parameters:
  • offset (int, default 0) – Offset from start of array to slice
  • length (int, default None) – Length of slice (default is until end of batch starting from offset)
Returns:

sliced (RecordBatch)

to_pandas(self, nthreads=None)

Convert the arrow::RecordBatch to a pandas DataFrame

Returns:pandas.DataFrame
to_pydict(self)

Converted the arrow::RecordBatch to an OrderedDict

Returns:OrderedDict