pyarrow.Table

class pyarrow.Table

Bases: object

A collection of top-level named, equal length Arrow arrays.

Warning

Do not call this class’s constructor directly, use one of the from_* methods instead.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

add_column(self, int i, Column column) Add column to Table at position.
append_column(self, Column column) Append column at end of columns.
column(self, int i) Select a column by its numeric index.
equals(self, Table other) Check if contents of two tables are equal
from_arrays(arrays[, names, schema]) Construct a Table from Arrow arrays or columns
from_batches(batches) Construct a Table from a list of Arrow RecordBatches
from_pandas(type cls, df, …[, nthreads]) Convert pandas.DataFrame to an Arrow Table
itercolumns(self) Iterator over all columns in their numerical order
remove_column(self, int i) Create new Table with the indicated column removed
replace_schema_metadata(self, dict metadata=None) EXPERIMENTAL: Create shallow copy of table by replacing schema
to_pandas(self[, nthreads, …]) Convert the arrow::Table to a pandas DataFrame
to_pydict(self) Converted the arrow::Table to an OrderedDict
add_column(self, int i, Column column)

Add column to Table at position. Returns new table

append_column(self, Column column)

Append column at end of columns. Returns new table

column(self, int i)

Select a column by its numeric index.

Parameters:i (int) –
Returns:pyarrow.Column
equals(self, Table other)

Check if contents of two tables are equal

Parameters:other (pyarrow.Table) –
Returns:are_equal (boolean)
static from_arrays(arrays, names=None, schema=None, dict metadata=None)

Construct a Table from Arrow arrays or columns

Parameters:
  • arrays (list of pyarrow.Array or pyarrow.Column) – Equal-length arrays that should form the table.
  • names (list of str, optional) – Names for the table columns. If Columns passed, will be inferred. If Arrays passed, this argument is required
Returns:

pyarrow.Table

static from_batches(batches)

Construct a Table from a list of Arrow RecordBatches

Parameters:batches (list of RecordBatch) – RecordBatch list to be converted, schemas must be equal
from_pandas(type cls, df, Schema schema=None, bool preserve_index=True, nthreads=None)

Convert pandas.DataFrame to an Arrow Table

Parameters:
  • df (pandas.DataFrame) –
  • schema (pyarrow.Schema, optional) – The expected schema of the Arrow Table. This can be used to indicate the type of columns if we cannot infer it automatically.
  • preserve_index (bool, optional) – Whether to store the index as an additional column in the resulting Table.
  • nthreads (int, default None (may use up to system CPU count threads)) – If greater than 1, convert columns to Arrow in parallel using indicated number of threads
Returns:

pyarrow.Table

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> df = pd.DataFrame({
    ...     'int': [1, 2],
    ...     'str': ['a', 'b']
    ... })
>>> pa.Table.from_pandas(df)
<pyarrow.lib.Table object at 0x7f05d1fb1b40>
itercolumns(self)

Iterator over all columns in their numerical order

num_columns

Number of columns in this table

Returns:int
num_rows

Number of rows in this table.

Due to the definition of a table, all columns have the same number of rows.

Returns:int
remove_column(self, int i)

Create new Table with the indicated column removed

replace_schema_metadata(self, dict metadata=None)

EXPERIMENTAL: Create shallow copy of table by replacing schema key-value metadata with the indicated new metadata (which may be None, which deletes any existing metadata

Parameters:metadata (dict, default None) –
Returns:shallow_copy (Table)
schema

Schema of the table and its columns

Returns:pyarrow.Schema
shape

Dimensions of the table – (#rows, #columns)

Returns:(int, int)
to_pandas(self, nthreads=None, strings_to_categorical=False, memory_pool=None, zero_copy_only=False)

Convert the arrow::Table to a pandas DataFrame

Parameters:
  • nthreads (int, default max(1, multiprocessing.cpu_count() / 2)) – For the default, we divide the CPU count by 2 because most modern computers have hyperthreading turned on, so doubling the CPU count beyond the number of physical cores does not help
  • strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
  • memory_pool (MemoryPool, optional) – Specific memory pool to use to allocate casted columns
  • zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
Returns:

pandas.DataFrame

to_pydict(self)

Converted the arrow::Table to an OrderedDict

Returns:OrderedDict