pyarrow.dataset.DirectoryPartitioning¶
-
class
pyarrow.dataset.
DirectoryPartitioning
¶ Bases:
pyarrow._dataset.Partitioning
A Partitioning based on a specified Schema.
The DirectoryPartitioning expects one segment in the file path for each field in the schema (all fields are required to be present). For example given schema<year:int16, month:int8> the path “/2009/11” would be parsed to (“year”_ == 2009 and “month”_ == 11).
- Parameters
schema (Schema) – The schema that describes the partitions present in the file path.
- Returns
DirectoryPartitioning
Examples
>>> from pyarrow.dataset import DirectoryPartitioning >>> partition = DirectoryPartitioning( ... pa.schema([("year", pa.int16()), ("month", pa.int8())])) >>> print(partitioning.parse("/2009/11")) ((year == 2009:int16) and (month == 11:int8))
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(*args, **kwargs)Initialize self.
Discover a DirectoryPartitioning.
Attributes
The arrow Schema attached to the partitioning.
-
static
discover
()¶ Discover a DirectoryPartitioning.
- Parameters
field_names (list of str) – The names to associate with the values from the subdirectory names.
infer_dictionary (bool, default False) – When inferring a schema for partition fields, yield dictionary encoded types instead of plain types. This can be more efficient when materializing virtual columns, and Expressions parsed by the finished Partitioning will include dictionaries of all unique inspected values for each field.
max_partition_dictionary_size (int, default 0) – Synonymous with infer_dictionary for backwards compatibility with 1.0: setting this to -1 or None is equivalent to passing infer_dictionary=True.
- Returns
DirectoryPartitioningFactory – To be used in the FileSystemFactoryOptions.
-
parse
()¶
-
schema
¶ The arrow Schema attached to the partitioning.