Dataset#

Warning

The pyarrow.dataset module is experimental (specifically the classes), and a stable API is not yet guaranteed.

Factory functions#

`dataset`(source[, schema, format, ...])	Open a dataset.
`parquet_dataset`(metadata_path[, schema, ...])	Create a FileSystemDataset from a _metadata file created via pyarrrow.parquet.write_metadata.
`partitioning`([schema, field_names, flavor, ...])	Specify a partitioning scheme.
`field`(*name_or_index)	Reference a column of the dataset.
`scalar`(value)	Expression representing a scalar value.
`write_dataset`(data, base_dir[, ...])	Write a dataset to a given format and partitioning.

Classes#

`FileFormat`()
`CsvFileFormat`(ParseOptions parse_options=None)	FileFormat for CSV files.
`CsvFragmentScanOptions`(...)	Scan-specific options for CSV fragments.
`IpcFileFormat`()
`ParquetFileFormat`([read_options, ...])	FileFormat for Parquet
`ParquetReadOptions`([dictionary_columns, ...])	Parquet format specific options for reading.
`ParquetFragmentScanOptions`(...[, buffer_size])	Scan-specific options for Parquet fragments.
`OrcFileFormat`()
`Partitioning`()
`PartitioningFactory`()
`DirectoryPartitioning`(Schema schema[, ...])	A Partitioning based on a specified Schema.
`HivePartitioning`(Schema schema[, ...])	A Partitioning for "/$key=$value/" nested directories as found in Apache Hive.
`FilenamePartitioning`(Schema schema[, ...])	A Partitioning based on a specified Schema.
`Dataset`()	Collection of data fragments and potentially child datasets.
`FileSystemDataset`(fragments, Schema schema, ...)	A Dataset of file fragments.
`FileSystemFactoryOptions`([...])	Influences the discovery of filesystem paths.
`FileSystemDatasetFactory`(...)	Create a DatasetFactory from a list of paths with schema inspection.
`UnionDataset`(Schema schema, children)	A Dataset wrapping child datasets.
`Fragment`()	Fragment of data from a Dataset.
`FragmentScanOptions`()	Scan options specific to a particular fragment and scan operation.
`TaggedRecordBatch`(record_batch, fragment)	A combination of a record batch and the fragment it came from.
`Scanner`()	A materialized scan operation with context and options bound.
`Expression`()	A logical expression to be evaluated against some input.
`InMemoryDataset`(source, Schema schema=None)	A Dataset wrapping in-memory data.

pyarrow.fs.S3LogLevel

pyarrow.dataset.dataset