pyarrow.record_batch¶
- pyarrow.record_batch(data, names=None, schema=None, metadata=None)¶
- Create a pyarrow.RecordBatch from another Python data structure or sequence of arrays. - Parameters:
- datapandas.DataFrame,list
- A DataFrame or list of arrays or chunked arrays. 
- nameslist, defaultNone
- Column names if list of arrays passed as data. Mutually exclusive with ‘schema’ argument. 
- schemaSchema, defaultNone
- The expected schema of the RecordBatch. If not passed, will be inferred from the data. Mutually exclusive with ‘names’ argument. 
- metadatadictor Mapping, defaultNone
- Optional metadata for the schema (if schema not passed). 
 
- data
- Returns:
 - See also - Examples - >>> import pyarrow as pa >>> n_legs = pa.array([2, 2, 4, 4, 5, 100]) >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"]) >>> names = ["n_legs", "animals"] - Creating a RecordBatch from a list of arrays with names: - >>> pa.record_batch([n_legs, animals], names=names) pyarrow.RecordBatch n_legs: int64 animals: string >>> pa.record_batch([n_legs, animals], names=["n_legs", "animals"]).to_pandas() n_legs animals 0 2 Flamingo 1 2 Parrot 2 4 Dog 3 4 Horse 4 5 Brittle stars 5 100 Centipede - Creating a RecordBatch from a list of arrays with names and metadata: - >>> my_metadata={"n_legs": "How many legs does an animal have?"} >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata) pyarrow.RecordBatch n_legs: int64 animals: string >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'How many legs does an animal have?' - Creating a RecordBatch from a pandas DataFrame: - >>> import pandas as pd >>> df = pd.DataFrame({'year': [2020, 2022, 2021, 2022], ... 'month': [3, 5, 7, 9], ... 'day': [1, 5, 9, 13], ... 'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) >>> pa.record_batch(df) pyarrow.RecordBatch year: int64 month: int64 day: int64 n_legs: int64 animals: string >>> pa.record_batch(df).to_pandas() year month day n_legs animals 0 2020 3 1 2 Flamingo 1 2022 5 5 4 Horse 2 2021 7 9 5 Brittle stars 3 2022 9 13 100 Centipede - Creating a RecordBatch from a pandas DataFrame with schema: - >>> my_schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> pa.record_batch(df, my_schema).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' pandas: ... >>> pa.record_batch(df, my_schema).to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede 
