Pass a Partitioning object to a FileSystemDatasetFactory's $create()
method to indicate how the file's paths should be interpreted to define
partitioning.
DirectoryPartitioning describes how to interpret raw path segments, in
order. For example, schema(year = int16(), month = int8()) would define
partitions for file paths like "2019/01/file.parquet",
"2019/02/file.parquet", etc.
HivePartitioning is for Hive-style partitioning, which embeds field
names and values in path segments, such as
"/year=2019/month=2/data.parquet". Because fields are named in the path
segments, order does not matter.
PartitioningFactory subclasses instruct the DatasetFactory to detect
partition features from the file paths.
Both DirectoryPartitioning$create() and HivePartitioning$create()
methods take a Schema as a single input argument. The helper
function hive_partition(...) is shorthand for
HivePartitioning$create(schema(...)).
With DirectoryPartitioningFactory$create(), you can provide just the
names of the path segments (in our example, c("year", "month")), and
the DatasetFactory will infer the data types for those partition variables.
HivePartitioningFactory$create() takes no arguments: both variable names
and their types can be inferred from the file paths. hive_partition() with
no arguments returns a HivePartitioningFactory.