pyarrow.table

pyarrow.table(data, names=None, schema=None, metadata=None, nthreads=None)

Create a pyarrow.Table from a Python data structure or sequence of arrays.

Parameters:
datapandas.DataFrame, dict, list

A DataFrame, mapping of strings to Arrays or Python lists, or list of arrays or chunked arrays.

nameslist, default None

Column names if list of arrays passed as data. Mutually exclusive with ‘schema’ argument.

schemaSchema, default None

The expected schema of the Arrow Table. If not passed, will be inferred from the data. Mutually exclusive with ‘names’ argument. If passed, the output will have exactly this schema (raising an error when columns are not found in the data and ignoring additional data not specified in the schema, when data is a dict or DataFrame).

metadatadict or Mapping, default None

Optional metadata for the schema (if schema not passed).

nthreadsint, default None

For pandas.DataFrame inputs: if greater than 1, convert columns to Arrow in parallel using indicated number of threads. By default, this follows pyarrow.cpu_count() (may use up to system CPU count threads).

Returns:
Table

Examples

>>> import pyarrow as pa
>>> n_legs = pa.array([2, 4, 5, 100])
>>> animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
>>> names = ["n_legs", "animals"]

Construct a Table from arrays:

>>> pa.table([n_legs, animals], names=names)
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]

Construct a Table from arrays with metadata:

>>> my_metadata={"n_legs": "Number of legs per animal"}
>>> pa.table([n_legs, animals], names=names, metadata = my_metadata).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'

Construct a Table from pandas DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
...                    'n_legs': [2, 4, 5, 100],
...                    'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> pa.table(df)
pyarrow.Table
year: int64
n_legs: int64
animals: string
----
year: [[2020,2022,2019,2021]]
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]

Construct a Table from pandas DataFrame with pyarrow schema:

>>> my_schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> pa.table(df, my_schema).schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'
pandas: '{"index_columns": [], "column_indexes": [{"name": null, ...

Construct a Table from chunked arrays:

>>> n_legs = pa.chunked_array([[2, 2, 4], [4, 5, 100]])
>>> animals = pa.chunked_array([["Flamingo", "Parrot", "Dog"], ["Horse", "Brittle stars", "Centipede"]])
>>> table = pa.table([n_legs, animals], names=names)
>>> table
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,2,4],[4,5,100]]
animals: [["Flamingo","Parrot","Dog"],["Horse","Brittle stars","Centipede"]]