Dataset#
Warning
The pyarrow.dataset
module is experimental (specifically the classes),
and a stable API is not yet guaranteed.
Factory functions#
|
Open a dataset. |
|
Create a FileSystemDataset from a _metadata file created via pyarrrow.parquet.write_metadata. |
|
Specify a partitioning scheme. |
|
Reference a column of the dataset. |
|
Expression representing a scalar value. |
|
Write a dataset to a given format and partitioning. |
Classes#
|
FileFormat for CSV files. |
Scan-specific options for CSV fragments. |
|
|
FileFormat for Parquet |
|
Parquet format specific options for reading. |
|
Scan-specific options for Parquet fragments. |
|
A Partitioning based on a specified Schema. |
|
A Partitioning for "/$key=$value/" nested directories as found in Apache Hive. |
|
A Partitioning based on a specified Schema. |
|
Collection of data fragments and potentially child datasets. |
|
A Dataset of file fragments. |
|
Influences the discovery of filesystem paths. |
Create a DatasetFactory from a list of paths with schema inspection. |
|
|
A Dataset wrapping child datasets. |
|
Fragment of data from a Dataset. |
Scan options specific to a particular fragment and scan operation. |
|
|
A combination of a record batch and the fragment it came from. |
|
A materialized scan operation with context and options bound. |
A logical expression to be evaluated against some input. |
|
|
A Dataset wrapping in-memory data. |