pyarrow.dataset.FileSystemFactoryOptions

class pyarrow.dataset.FileSystemFactoryOptions(partition_base_dir=None, partitioning=None, exclude_invalid_files=None, list selector_ignore_prefixes=None)

Bases: pyarrow.lib._Weakrefable

Influences the discovery of filesystem paths.

Parameters
  • partition_base_dir (str, optional) – For the purposes of applying the partitioning, paths will be stripped of the partition_base_dir. Files not matching the partition_base_dir prefix will be skipped for partitioning discovery. The ignored files will still be part of the Dataset, but will not have partition information.

  • partitioning (Partitioning/PartitioningFactory, optional) – Apply the Partitioning to every discovered Fragment. See Partitioning or PartitioningFactory documentation.

  • exclude_invalid_files (bool, optional (default True)) – If True, invalid files will be excluded (file format specific check). This will incur IO for each files in a serial and single threaded fashion. Disabling this feature will skip the IO, but unsupported files may be present in the Dataset (resulting in an error at scan time).

  • selector_ignore_prefixes (list, optional) – When discovering from a Selector (and not from an explicit file list), ignore files and directories matching any of these prefixes. By default this is [‘.’, ‘_’].

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

Attributes

exclude_invalid_files

Whether to exclude invalid files.

partition_base_dir

Base directory to strip paths before applying the partitioning.

partitioning

Partitioning to apply to discovered files.

partitioning_factory

PartitioningFactory to apply to discovered files and discover a Partitioning.

selector_ignore_prefixes

List of prefixes.

exclude_invalid_files

Whether to exclude invalid files.

partition_base_dir

Base directory to strip paths before applying the partitioning.

partitioning

Partitioning to apply to discovered files.

NOTE: setting this property will overwrite partitioning_factory.

partitioning_factory

PartitioningFactory to apply to discovered files and discover a Partitioning.

NOTE: setting this property will overwrite partitioning.

selector_ignore_prefixes

List of prefixes. Files matching one of those prefixes will be ignored by the discovery process.