Datasets

Warning

The pyarrow.dataset module is experimental (specifically the classes), and a stable API is not yet guaranteed.

Factory functions

dataset(source[, schema, format, …])

Open a dataset.

partitioning([schema, field_names, flavor])

Specify a partitioning scheme.

field(name)

Reference a named column of the dataset.

scalar(value)

Expression representing a scalar value.

Classes

FileFormat

ParquetFileFormat

Partitioning

PartitioningFactory

DirectoryPartitioning

A Partitioning based on a specified Schema.

HivePartitioning

A Partitioning for “/$key=$value/” nested directories as found in Apache Hive.

Dataset

Collection of data fragments and potentially child datasets.

FileSystemDataset

A Dataset created from a set of files on a particular filesystem.

FileSystemFactoryOptions

Influences the discovery of filesystem paths.

FileSystemDatasetFactory

Create a DatasetFactory from a list of paths with schema inspection.

UnionDataset

A Dataset wrapping child datasets.

Scanner

A materialized scan operation with context and options bound.

Expression