pyarrow.parquet.write_to_dataset¶
- pyarrow.parquet.write_to_dataset(table, root_path, partition_cols=None, partition_filename_cb=None, filesystem=None, use_legacy_dataset=None, **kwargs)[source]¶
Wrapper around parquet.write_table for writing a Table to Parquet format by partitions. For each combination of partition columns and values, a subdirectories are created in the following manner:
- root_dir/
- group1=value1
- group2=value1
<uuid>.parquet
- group2=value2
<uuid>.parquet
- group1=valueN
- group2=value1
<uuid>.parquet
- group2=valueN
<uuid>.parquet
- Parameters
- table
pyarrow.Table
- root_path
str
,pathlib.Path
The root directory of the dataset
- filesystem
FileSystem
, defaultNone
If nothing passed, paths assumed to be found in the local on-disk filesystem
- partition_colslist,
Column names by which to partition the dataset Columns are partitioned in the order they are given
- partition_filename_cbcallable,
A callback function that takes the partition key(s) as an argument and allow you to override the partition filename. If nothing is passed, the filename will consist of a uuid.
- use_legacy_datasetbool
Default is True unless a
pyarrow.fs
filesystem is passed. Set to False to enable the new code path (experimental, using the new Arrow Dataset API). This is more efficient when using partition columns, but does not (yet) support partition_filename_cb and metadata_collector keywords.- **kwargsdict,
Additional kwargs for write_table function. See docstring for write_table or ParquetWriter for more information. Using metadata_collector in kwargs allows one to collect the file metadata instances of dataset pieces. The file paths in the ColumnChunkMetaData will be set relative to root_path.
- table