pyarrow.dataset.FileSystemDatasetFactory

class pyarrow.dataset.FileSystemDatasetFactory(FileSystem filesystem, paths_or_selector, FileFormat format, FileSystemFactoryOptions options=None)

Bases: pyarrow._dataset.DatasetFactory

Create a DatasetFactory from a list of paths with schema inspection.

Parameters
filesystempyarrow.fs.FileSystem

Filesystem to discover.

paths_or_selectorpyarrow.fs.FileSelector or list of path-likes

Either a Selector object or a list of path-like objects.

formatFileFormat

Currently only ParquetFileFormat and IpcFileFormat are supported.

optionsFileSystemFactoryOptions, optional

Various flags influencing the discovery of filesystem paths.

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

finish(self, Schema schema=None)

Create a Dataset using the inspected schema or an explicit schema (if given).

inspect(self)

Inspect all data fragments and return a common Schema.

inspect_schemas(self)

Attributes

root_partition

finish(self, Schema schema=None)

Create a Dataset using the inspected schema or an explicit schema (if given).

Parameters
schemaSchema, default None

The schema to conform the source to. If None, the inspected schema is used.

Returns
Dataset
inspect(self)

Inspect all data fragments and return a common Schema.

Returns
Schema
inspect_schemas(self)
root_partition