pyarrow.dataset.FileSystemDatasetFactory

class pyarrow.dataset.FileSystemDatasetFactory(FileSystem filesystem, paths_or_selector, FileFormat format, FileSystemFactoryOptions options=None)

Bases: pyarrow._dataset.DatasetFactory

Create a DatasetFactory from a list of paths with schema inspection.

Parameters
  • filesystem (pyarrow.fs.FileSystem) – Filesystem to discover.

  • paths_or_selector (pyarrow.fs.Selector or list of path-likes) – Either a Selector object or a list of path-like objects.

  • format (FileFormat) – Currently only ParquetFileFormat and IpcFileFormat are supported.

  • options (FileSystemFactoryOptions, optional) – Various flags influencing the discovery of filesystem paths.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

finish(self, Schema schema=None)

Create a Dataset using the inspected schema or an explicit schema (if given).

inspect(self)

Inspect all data fragments and return a common Schema.

inspect_schemas(self)

Attributes

root_partition

finish(self, Schema schema=None)

Create a Dataset using the inspected schema or an explicit schema (if given).

Parameters

schema (Schema, default None) – The schema to conform the source to. If None, the inspected schema is used.

Returns

Dataset

inspect(self)

Inspect all data fragments and return a common Schema.

Returns

Schema

inspect_schemas(self)
root_partition