pyarrow.parquet.FileMetaData

class pyarrow.parquet.FileMetaData

Bases: _Weakrefable

Parquet metadata for a single file.

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

append_row_groups(self, FileMetaData other)

Append row groups from other FileMetaData object.

equals(self, FileMetaData other)

Return whether the two file metadata objects are equal.

row_group(self, int i)

Get metadata for row group at index i.

set_file_path(self, path)

Set ColumnChunk file paths to the given value.

to_dict(self)

Get dictionary represenation of the file metadata.

write_metadata_file(self, where)

Write the metadata to a metadata-only Parquet file.

Attributes

created_by

String describing source of the parquet file (str).

format_version

Parquet format version used in file (str, such as '1.0', '2.4').

metadata

Additional metadata as key value pairs (dict[bytes, bytes]).

num_columns

Number of columns in file (int).

num_row_groups

Number of row groups in file (int).

num_rows

Total number of rows in file (int).

schema

Schema of the file (ParquetSchema).

serialized_size

Size of the original thrift encoded metadata footer (int).

append_row_groups(self, FileMetaData other)

Append row groups from other FileMetaData object.

Parameters:
otherFileMetaData

Other metadata to append row groups from.

created_by

String describing source of the parquet file (str).

This typically includes library name and version number. For example, Arrow 7.0’s writer returns ‘parquet-cpp-arrow version 7.0.0’.

equals(self, FileMetaData other)

Return whether the two file metadata objects are equal.

Parameters:
otherFileMetaData

Metadata to compare against.

Returns:
are_equalbool
format_version

Parquet format version used in file (str, such as ‘1.0’, ‘2.4’).

If version is missing or unparsable, will default to assuming ‘2.6’.

metadata

Additional metadata as key value pairs (dict[bytes, bytes]).

num_columns

Number of columns in file (int).

num_row_groups

Number of row groups in file (int).

num_rows

Total number of rows in file (int).

row_group(self, int i)

Get metadata for row group at index i.

Parameters:
iint

Row group index to get.

Returns:
row_group_metadataRowGroupMetaData
schema

Schema of the file (ParquetSchema).

serialized_size

Size of the original thrift encoded metadata footer (int).

set_file_path(self, path)

Set ColumnChunk file paths to the given value.

This method modifies the file_path field of each ColumnChunk in the FileMetaData to be a particular value.

Parameters:
pathstr

The file path to set on all ColumnChunks.

to_dict(self)

Get dictionary represenation of the file metadata.

Returns:
dict

Dictionary with a key for each attribute of this class.

write_metadata_file(self, where)

Write the metadata to a metadata-only Parquet file.

Parameters:
wherepath or file-like object

Where to write the metadata. Should be a writable path on the local filesystem, or a writable file-like object.