pyarrow.dataset.parquet_dataset#
- pyarrow.dataset.parquet_dataset(metadata_path, schema=None, filesystem=None, format=None, partitioning=None, partition_base_dir=None)[source]#
Create a FileSystemDataset from a _metadata file created via pyarrrow.parquet.write_metadata.
- Parameters:
- metadata_pathpath,
Path pointing to a single file parquet metadata file
- schema
Schema
, optional Optionally provide the Schema for the Dataset, in which case it will not be inferred from the source.
- filesystem
FileSystem
or URIstr
, defaultNone
If a single path is given as source and filesystem is None, then the filesystem will be inferred from the path. If an URI string is passed, then a filesystem object is constructed using the URI’s optional path component as a directory prefix. See the examples below. Note that the URIs on Windows must follow ‘file:///C:…’ or ‘file:/C:…’ patterns.
- format
ParquetFileFormat
An instance of a ParquetFileFormat if special options needs to be passed.
- partitioning
Partitioning
,PartitioningFactory
,str
,list
ofstr
The partitioning scheme specified with the
partitioning()
function. A flavor string can be used as shortcut, and with a list of field names a DirectionaryPartitioning will be inferred.- partition_base_dir
str
, optional For the purposes of applying the partitioning, paths will be stripped of the partition_base_dir. Files not matching the partition_base_dir prefix will be skipped for partitioning discovery. The ignored files will still be part of the Dataset, but will not have partition information.
- Returns:
FileSystemDataset
The dataset corresponding to the given metadata