pyarrow.parquet.FileMetaData#

class pyarrow.parquet.FileMetaData#

Bases: _Weakrefable

Parquet metadata for a single file.

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

append_row_groups(self, FileMetaData other)

Append row groups from other FileMetaData object.

equals(self, FileMetaData other)

Return whether the two file metadata objects are equal.

row_group(self, int i)

Get metadata for row group at index i.

set_file_path(self, path)

Set ColumnChunk file paths to the given value.

to_dict(self)

Get dictionary representation of the file metadata.

write_metadata_file(self, where)

Write the metadata to a metadata-only Parquet file.

Attributes

created_by

String describing source of the parquet file (str).

format_version

Parquet format version used in file (str, such as '1.0', '2.4').

metadata

Additional metadata as key value pairs (dict[bytes, bytes]).

num_columns

Number of columns in file (int).

num_row_groups

Number of row groups in file (int).

num_rows

Total number of rows in file (int).

schema

Schema of the file (ParquetSchema).

serialized_size

Size of the original thrift encoded metadata footer (int).

append_row_groups(self, FileMetaData other)#

Append row groups from other FileMetaData object.

Parameters:
otherFileMetaData

Other metadata to append row groups from.

created_by#

String describing source of the parquet file (str).

This typically includes library name and version number. For example, Arrow 7.0’s writer returns ‘parquet-cpp-arrow version 7.0.0’.

equals(self, FileMetaData other)#

Return whether the two file metadata objects are equal.

Parameters:
otherFileMetaData

Metadata to compare against.

Returns:
are_equalbool
format_version#

Parquet format version used in file (str, such as ‘1.0’, ‘2.4’).

If version is missing or unparsable, will default to assuming ‘2.6’.

metadata#

Additional metadata as key value pairs (dict[bytes, bytes]).

num_columns#

Number of columns in file (int).

num_row_groups#

Number of row groups in file (int).

num_rows#

Total number of rows in file (int).

row_group(self, int i)#

Get metadata for row group at index i.

Parameters:
iint

Row group index to get.

Returns:
row_group_metadataRowGroupMetaData
schema#

Schema of the file (ParquetSchema).

serialized_size#

Size of the original thrift encoded metadata footer (int).

set_file_path(self, path)#

Set ColumnChunk file paths to the given value.

This method modifies the file_path field of each ColumnChunk in the FileMetaData to be a particular value.

Parameters:
pathstr

The file path to set on all ColumnChunks.

to_dict(self)#

Get dictionary representation of the file metadata.

Returns:
dict

Dictionary with a key for each attribute of this class.

write_metadata_file(self, where)#

Write the metadata to a metadata-only Parquet file.

Parameters:
wherepath or file-like object

Where to write the metadata. Should be a writable path on the local filesystem, or a writable file-like object.