pyarrow.dataset.FileSystemDatasetFactory¶
-
class
pyarrow.dataset.
FileSystemDatasetFactory
(FileSystem filesystem, paths_or_selector, FileFormat format, FileSystemFactoryOptions options=None)¶ Bases:
pyarrow._dataset.DatasetFactory
Create a DatasetFactory from a list of paths with schema inspection.
- Parameters
filesystem (pyarrow.fs.FileSystem) – Filesystem to discover.
paths_or_selector (pyarrow.fs.Selector or list of path-likes) – Either a Selector object or a list of path-like objects.
format (FileFormat) – Currently only ParquetFileFormat and IpcFileFormat are supported.
options (FileSystemFactoryOptions, optional) – Various flags influencing the discovery of filesystem paths.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(*args, **kwargs)Initialize self.
finish
(self, Schema schema=None)Create a Dataset using the inspected schema or an explicit schema (if given).
inspect
(self)Inspect all data fragments and return a common Schema.
inspect_schemas
(self)Attributes
-
finish
(self, Schema schema=None)¶ Create a Dataset using the inspected schema or an explicit schema (if given).
- Parameters
schema (Schema, default None) – The schema to conform the source to. If None, the inspected schema is used.
- Returns
Dataset
-
inspect
(self)¶ Inspect all data fragments and return a common Schema.
- Returns
Schema
-
inspect_schemas
(self)¶
-
root_partition
¶