pyarrow.concat_batches#

pyarrow.concat_batches(recordbatches, MemoryPool memory_pool=None)#

Concatenate pyarrow.RecordBatch objects.

All recordbatches must share the same Schema, the operation implies a copy of the data to merge the arrays of the different RecordBatches.

Parameters:
recordbatchesiterable of pyarrow.RecordBatch objects

Pyarrow record batches to concatenate into a single RecordBatch.

memory_poolMemoryPool, default None

For memory allocations, if required, otherwise use default pool.

Examples

>>> import pyarrow as pa
>>> t1 = pa.record_batch([
...     pa.array([2, 4, 5, 100]),
...     pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
...     ], names=['n_legs', 'animals'])
>>> t2 = pa.record_batch([
...     pa.array([2, 4]),
...     pa.array(["Parrot", "Dog"])
...     ], names=['n_legs', 'animals'])
>>> pa.concat_batches([t1,t2])
pyarrow.RecordBatch
n_legs: int64
animals: string
----
n_legs: [2,4,5,100,2,4]
animals: ["Flamingo","Horse","Brittle stars","Centipede","Parrot","Dog"]