On this page

Dataset¶

Warning

The pyarrow.dataset module is experimental (specifically the classes), and a stable API is not yet guaranteed.

Factory functions¶

`dataset`(source[, schema, format, …])	Open a dataset.
`parquet_dataset`(metadata_path[, schema, …])	Create a FileSystemDataset from a _metadata file created via pyarrrow.parquet.write_metadata.
`partitioning`([schema, field_names, flavor, …])	Specify a partitioning scheme.
`field`(name)	Reference a named column of the dataset.
`scalar`(value)	Expression representing a scalar value.
`write_dataset`(data, base_dir[, …])	Write a dataset to a given format and partitioning.

Classes¶

`FileFormat`()
`ParquetFileFormat`([read_options, …])	FileFormat for Parquet
`IpcFileFormat`()
`CsvFileFormat`(ParseOptions parse_options=None)	FileFormat for CSV files.
`Partitioning`()
`PartitioningFactory`()
`DirectoryPartitioning`(Schema schema[, …])	A Partitioning based on a specified Schema.
`HivePartitioning`(Schema schema[, …])	A Partitioning for “/$key=$value/” nested directories as found in Apache Hive.
`Dataset`()	Collection of data fragments and potentially child datasets.
`FileSystemDataset`(fragments, Schema schema, …)	A Dataset of file fragments.
`FileSystemFactoryOptions`([…])	Influences the discovery of filesystem paths.
`FileSystemDatasetFactory`(…)	Create a DatasetFactory from a list of paths with schema inspection.
`UnionDataset`(Schema schema, children)	A Dataset wrapping child datasets.
`Scanner`()	A materialized scan operation with context and options bound.
`Expression`()	A logical expression to be evaluated against some input.

previous

pyarrow.fs.FSSpecHandler

next

pyarrow.dataset.dataset