pyarrow.Schema

class pyarrow.Schema

Bases: _Weakrefable

A named collection of types a.k.a schema. A schema defines the column names and types in a record batch or table data structure. They also contain metadata about the columns. For example, schemas converted from Pandas contain metadata about their original Pandas types so they can be converted back to the same types.

Warning

Do not call this class’s constructor directly. Instead use pyarrow.schema() factory function which makes a new Arrow Schema object.

Examples

Create a new Arrow Schema object:

>>> import pyarrow as pa
>>> pa.schema([
...     ('some_int', pa.int32()),
...     ('some_string', pa.string())
... ])
some_int: int32
some_string: string

Create Arrow Schema with metadata:

>>> pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'
__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

add_metadata(self, metadata)

DEPRECATED

append(self, Field field)

Append a field at the end of the schema.

empty_table(self)

Provide an empty table according to the schema.

equals(self, Schema other, ...)

Test if this schema is equal to the other

field(self, i)

Select a field by its column name or numeric index.

field_by_name(self, name)

DEPRECATED

from_pandas(type cls, df[, preserve_index])

Returns implied schema from dataframe

get_all_field_indices(self, name)

Return sorted list of indices for the fields with the given name.

get_field_index(self, name)

Return index of the unique field with the given name.

insert(self, int i, Field field)

Add a field at position i to the schema.

remove(self, int i)

Remove the field at index i from the schema.

remove_metadata(self)

Create new schema without metadata, if any

serialize(self[, memory_pool])

Write Schema to Buffer as encapsulated IPC message

set(self, int i, Field field)

Replace a field at position i in the schema.

to_string(self[, truncate_metadata, ...])

Return human-readable representation of Schema

with_metadata(self, metadata)

Add metadata as dict of string keys and values to Schema

Attributes

metadata

The schema's metadata.

names

The schema's field names.

pandas_metadata

Return deserialized-from-JSON pandas metadata field (if it exists)

types

The schema's field types.

add_metadata(self, metadata)

DEPRECATED

Parameters:
metadatadict

Keys and values must be string-like / coercible to bytes

append(self, Field field)

Append a field at the end of the schema.

In contrast to Python’s list.append() it does return a new object, leaving the original Schema unmodified.

Parameters:
fieldField
Returns:
schema: Schema

New object with appended field.

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Append a field ‘extra’ at the end of the schema:

>>> schema_new = schema.append(pa.field('extra', pa.bool_()))
>>> schema_new
n_legs: int64
animals: string
extra: bool

Original schema is unmodified:

>>> schema
n_legs: int64
animals: string
empty_table(self)

Provide an empty table according to the schema.

Returns:
table: pyarrow.Table

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Create an empty table with schema’s fields:

>>> schema.empty_table()
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[]]
animals: [[]]
equals(self, Schema other, bool check_metadata=False)

Test if this schema is equal to the other

Parameters:
otherpyarrow.Schema
check_metadatabool, default False

Key/value metadata must be equal too

Returns:
is_equalbool

Examples

>>> import pyarrow as pa
>>> schema1 = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> schema2 = pa.schema([
...     ('some_int', pa.int32()),
...     ('some_string', pa.string())
... ])

Test two equal schemas:

>>> schema1.equals(schema1)
True

Test two unequal schemas:

>>> schema1.equals(schema2)
False
field(self, i)

Select a field by its column name or numeric index.

Parameters:
iint or str
Returns:
pyarrow.Field

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Select the second field:

>>> schema.field(1)
pyarrow.Field<animals: string>

Select the field of the column named ‘n_legs’:

>>> schema.field('n_legs')
pyarrow.Field<n_legs: int64>
field_by_name(self, name)

DEPRECATED

Parameters:
namestr
Returns:
field: pyarrow.Field
from_pandas(type cls, df, preserve_index=None)

Returns implied schema from dataframe

Parameters:
dfpandas.DataFrame
preserve_indexbool, default True

Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Use preserve_index=True to force it to be stored as a column.

Returns:
pyarrow.Schema

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> df = pd.DataFrame({
...     'int': [1, 2],
...     'str': ['a', 'b']
... })

Create an Arrow Schema from the schema of a pandas dataframe:

>>> pa.Schema.from_pandas(df)
int: int64
str: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, ...
get_all_field_indices(self, name)

Return sorted list of indices for the fields with the given name.

Parameters:
namestr

The name of the field to look up.

Returns:
indicesList[int]

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string()),
...     pa.field('animals', pa.bool_())])

Get the indexes of the fields named ‘animals’:

>>> schema.get_all_field_indices("animals")
[1, 2]
get_field_index(self, name)

Return index of the unique field with the given name.

Parameters:
namestr

The name of the field to look up.

Returns:
indexint

The index of the field with the given name; -1 if the name isn’t found or there are several fields with the given name.

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Get the index of the field named ‘animals’:

>>> schema.get_field_index("animals")
1

Index in case of several fields with the given name:

>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string()),
...     pa.field('animals', pa.bool_())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> schema.get_field_index("animals")
-1
insert(self, int i, Field field)

Add a field at position i to the schema.

Parameters:
iint
fieldField
Returns:
schema: Schema

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Insert a new field on the second position:

>>> schema.insert(1, pa.field('extra', pa.bool_()))
n_legs: int64
extra: bool
animals: string
metadata

The schema’s metadata.

Returns:
metadata: dict

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})

Get the metadata of the schema’s fields:

>>> schema.metadata
{b'n_legs': b'Number of legs per animal'}
names

The schema’s field names.

Returns:
list of str

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Get the names of the schema’s fields:

>>> schema.names
['n_legs', 'animals']
pandas_metadata

Return deserialized-from-JSON pandas metadata field (if it exists)

Examples

>>> import pyarrow as pa
>>> import pandas as pd
>>> df = pd.DataFrame({'n_legs': [2, 4, 5, 100],
...                    'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> schema = pa.Table.from_pandas(df).schema

Select pandas metadata field from Arrow Schema:

>>> schema.pandas_metadata
{'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ...
remove(self, int i)

Remove the field at index i from the schema.

Parameters:
iint
Returns:
schema: Schema

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Remove the second field of the schema:

>>> schema.remove(1)
n_legs: int64
remove_metadata(self)

Create new schema without metadata, if any

Returns:
schemapyarrow.Schema

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())],
...     metadata={"n_legs": "Number of legs per animal"})
>>> schema
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'

Create a new schema with removing the metadata from the original:

>>> schema.remove_metadata()
n_legs: int64
animals: string
serialize(self, memory_pool=None)

Write Schema to Buffer as encapsulated IPC message

Parameters:
memory_poolMemoryPool, default None

Uses default memory pool if not specified

Returns:
serializedBuffer

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Write schema to Buffer:

>>> schema.serialize()
<pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
set(self, int i, Field field)

Replace a field at position i in the schema.

Parameters:
iint
fieldField
Returns:
schema: Schema

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Replace the second field of the schema with a new field ‘extra’:

>>> schema.set(1, pa.field('replaced', pa.bool_()))
n_legs: int64
replaced: bool
to_string(self, truncate_metadata=True, show_field_metadata=True, show_schema_metadata=True)

Return human-readable representation of Schema

Parameters:
truncate_metadatabool, default True

Limit metadata key/value display to a single line of ~80 characters or less

show_field_metadatabool, default True

Display Field-level KeyValueMetadata

show_schema_metadatabool, default True

Display Schema-level KeyValueMetadata

Returns:
strthe formatted output
types

The schema’s field types.

Returns:
list of DataType

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Get the types of the schema’s fields:

>>> schema.types
[DataType(int64), DataType(string)]
with_metadata(self, metadata)

Add metadata as dict of string keys and values to Schema

Parameters:
metadatadict

Keys and values must be string-like / coercible to bytes

Returns:
schemapyarrow.Schema

Examples

>>> import pyarrow as pa
>>> schema = pa.schema([
...     pa.field('n_legs', pa.int64()),
...     pa.field('animals', pa.string())])

Add metadata to existing schema field:

>>> schema.with_metadata({"n_legs": "Number of legs per animal"})
n_legs: int64
animals: string
-- schema metadata --
n_legs: 'Number of legs per animal'