pyarrow.dataset.ParquetFileFormat#

class pyarrow.dataset.ParquetFileFormat(read_options=None, default_fragment_scan_options=None, **kwargs)#

Bases: FileFormat

FileFormat for Parquet

Parameters:
read_optionsParquetReadOptions

Read options for the file.

default_fragment_scan_optionsParquetFragmentScanOptions

Scan Options for the file.

**kwargsdict

Additional options for read option or scan option

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

equals(self, ParquetFileFormat other)

Parameters:

inspect(self, file[, filesystem])

Infer the schema of a file.

make_fragment(self, file[, filesystem, ...])

Make a FileFragment from a given file.

make_write_options(self, **kwargs)

Parameters:

Attributes

default_extname

default_fragment_scan_options

read_options

default_extname#
default_fragment_scan_options#
equals(self, ParquetFileFormat other)#
Parameters:
otherpyarrow.dataset.ParquetFileFormat
Returns:
bool
inspect(self, file, filesystem=None)#

Infer the schema of a file.

Parameters:
filefile-like object, path-like or str

The file or file path to infer a schema from.

filesystemFilesystem, optional

If filesystem is given, file must be a string and specifies the path of the file to read from the filesystem.

Returns:
schemaSchema

The schema inferred from the file

make_fragment(self, file, filesystem=None, Expression partition_expression=None, row_groups=None, *, file_size=None)#

Make a FileFragment from a given file.

Parameters:
filefile-like object, path-like or str

The file or file path to make a fragment from.

filesystemFilesystem, optional

If filesystem is given, file must be a string and specifies the path of the file to read from the filesystem.

partition_expressionExpression, optional

An expression that is guaranteed true for all rows in the fragment. Allows fragment to be potentially skipped while scanning with a filter.

row_groupsIterable, optional

The indices of the row groups to include

file_sizeint, optional

The size of the file in bytes. Can improve performance with high-latency filesystems when file size needs to be known before reading.

Returns:
fragmentFragment

The file fragment

make_write_options(self, **kwargs)#
Parameters:
**kwargsdict
Returns:
pyarrow.dataset.FileWriteOptions
read_options#