pyarrow.Schema#
- class pyarrow.Schema#
- Bases: - _Weakrefable- A named collection of types a.k.a schema. A schema defines the column names and types in a record batch or table data structure. They also contain metadata about the columns. For example, schemas converted from Pandas contain metadata about their original Pandas types so they can be converted back to the same types. - Warning - Do not call this class’s constructor directly. Instead use - pyarrow.schema()factory function which makes a new Arrow Schema object.- Examples - Create a new Arrow Schema object: - >>> import pyarrow as pa >>> pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ]) some_int: int32 some_string: string - Create Arrow Schema with metadata: - >>> pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' - __init__(*args, **kwargs)#
 - Methods - __init__(*args, **kwargs)- add_metadata(self, metadata)- DEPRECATED - append(self, Field field)- Append a field at the end of the schema. - empty_table(self)- Provide an empty table according to the schema. - equals(self, Schema other, ...)- Test if this schema is equal to the other - field(self, i)- Select a field by its column name or numeric index. - field_by_name(self, name)- DEPRECATED - from_pandas(cls, df[, preserve_index])- Returns implied schema from dataframe - get_all_field_indices(self, name)- Return sorted list of indices for the fields with the given name. - get_field_index(self, name)- Return index of the unique field with the given name. - insert(self, int i, Field field)- Add a field at position i to the schema. - remove(self, int i)- Remove the field at index i from the schema. - remove_metadata(self)- Create new schema without metadata, if any - serialize(self[, memory_pool])- Write Schema to Buffer as encapsulated IPC message - set(self, int i, Field field)- Replace a field at position i in the schema. - to_string(self[, truncate_metadata, ...])- Return human-readable representation of Schema - with_metadata(self, metadata)- Add metadata as dict of string keys and values to Schema - Attributes - The schema's metadata (if any is set). - The schema's field names. - Return deserialized-from-JSON pandas metadata field (if it exists) - The schema's field types. - add_metadata(self, metadata)#
- DEPRECATED - Parameters:
- metadatadict
- Keys and values must be string-like / coercible to bytes 
 
- metadata
 
 - append(self, Field field)#
- Append a field at the end of the schema. - In contrast to Python’s - list.append()it does return a new object, leaving the original Schema unmodified.- Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Append a field ‘extra’ at the end of the schema: - >>> schema_new = schema.append(pa.field('extra', pa.bool_())) >>> schema_new n_legs: int64 animals: string extra: bool - Original schema is unmodified: - >>> schema n_legs: int64 animals: string 
 - empty_table(self)#
- Provide an empty table according to the schema. - Returns:
- table: pyarrow.Table
 
- table: 
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Create an empty table with schema’s fields: - >>> schema.empty_table() pyarrow.Table n_legs: int64 animals: string ---- n_legs: [[]] animals: [[]] 
 - equals(self, Schema other, bool check_metadata=False)#
- Test if this schema is equal to the other - Parameters:
- otherpyarrow.Schema
- check_metadatabool, default False
- Key/value metadata must be equal too 
 
- other
- Returns:
- is_equalbool
 
 - Examples - >>> import pyarrow as pa >>> schema1 = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema2 = pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ]) - Test two equal schemas: - >>> schema1.equals(schema1) True - Test two unequal schemas: - >>> schema1.equals(schema2) False 
 - field(self, i)#
- Select a field by its column name or numeric index. - Parameters:
- Returns:
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Select the second field: - >>> schema.field(1) pyarrow.Field<animals: string> - Select the field of the column named ‘n_legs’: - >>> schema.field('n_legs') pyarrow.Field<n_legs: int64> 
 - field_by_name(self, name)#
- DEPRECATED - Parameters:
- namestr
 
- name
- Returns:
- field: pyarrow.Field
 
- field: 
 
 - classmethod from_pandas(cls, df, preserve_index=None)#
- Returns implied schema from dataframe - Parameters:
- dfpandas.DataFrame
- preserve_indexbool, default True
- Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Use - preserve_index=Trueto force it to be stored as a column.
 
- df
- Returns:
 - Examples - >>> import pandas as pd >>> import pyarrow as pa >>> df = pd.DataFrame({ ... 'int': [1, 2], ... 'str': ['a', 'b'] ... }) - Create an Arrow Schema from the schema of a pandas dataframe: - >>> pa.Schema.from_pandas(df) int: int64 str: string -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, ... 
 - get_all_field_indices(self, name)#
- Return sorted list of indices for the fields with the given name. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())]) - Get the indexes of the fields named ‘animals’: - >>> schema.get_all_field_indices("animals") [1, 2] 
 - get_field_index(self, name)#
- Return index of the unique field with the given name. - Parameters:
- namestr
- The name of the field to look up. 
 
- name
- Returns:
- indexint
- The index of the field with the given name; -1 if the name isn’t found or there are several fields with the given name. 
 
- index
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Get the index of the field named ‘animals’: - >>> schema.get_field_index("animals") 1 - Index in case of several fields with the given name: - >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema.get_field_index("animals") -1 
 - insert(self, int i, Field field)#
- Add a field at position i to the schema. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Insert a new field on the second position: - >>> schema.insert(1, pa.field('extra', pa.bool_())) n_legs: int64 extra: bool animals: string 
 - metadata#
- The schema’s metadata (if any is set). - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) - Get the metadata of the schema’s fields: - >>> schema.metadata {b'n_legs': b'Number of legs per animal'} 
 - names#
- The schema’s field names. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Get the names of the schema’s fields: - >>> schema.names ['n_legs', 'animals'] 
 - pandas_metadata#
- Return deserialized-from-JSON pandas metadata field (if it exists) - Examples - >>> import pyarrow as pa >>> import pandas as pd >>> df = pd.DataFrame({'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) >>> schema = pa.Table.from_pandas(df).schema - Select pandas metadata field from Arrow Schema: - >>> schema.pandas_metadata {'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ... 
 - remove(self, int i)#
- Remove the field at index i from the schema. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Remove the second field of the schema: - >>> schema.remove(1) n_legs: int64 
 - remove_metadata(self)#
- Create new schema without metadata, if any - Returns:
- schemapyarrow.Schema
 
- schema
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' - Create a new schema with removing the metadata from the original: - >>> schema.remove_metadata() n_legs: int64 animals: string 
 - serialize(self, memory_pool=None)#
- Write Schema to Buffer as encapsulated IPC message - Parameters:
- memory_poolMemoryPool, defaultNone
- Uses default memory pool if not specified 
 
- memory_pool
- Returns:
- serializedBuffer
 
- serialized
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Write schema to Buffer: - >>> schema.serialize() <pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True> 
 - set(self, int i, Field field)#
- Replace a field at position i in the schema. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Replace the second field of the schema with a new field ‘extra’: - >>> schema.set(1, pa.field('replaced', pa.bool_())) n_legs: int64 replaced: bool 
 - to_string(self, truncate_metadata=True, show_field_metadata=True, show_schema_metadata=True)#
- Return human-readable representation of Schema - Parameters:
- Returns:
- strtheformattedoutput
 
- str
 
 - types#
- The schema’s field types. - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Get the types of the schema’s fields: - >>> schema.types [DataType(int64), DataType(string)] 
 - with_metadata(self, metadata)#
- Add metadata as dict of string keys and values to Schema - Parameters:
- metadatadict
- Keys and values must be string-like / coercible to bytes 
 
- metadata
- Returns:
- schemapyarrow.Schema
 
- schema
 - Examples - >>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())]) - Add metadata to existing schema field: - >>> schema.with_metadata({"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' 
 
 
    