This function allows you to write a dataset. By writing to more efficient binary storage formats, and by specifying relevant partitioning, you can make it much faster to read and query.

write_dataset(
dataset,
path,
format = c("parquet", "feather", "arrow", "ipc"),
partitioning = dplyr::group_vars(dataset),
basename_template = paste0("part-{i}.", as.character(format)),
hive_style = TRUE,
...
)

## Arguments

dataset Dataset, RecordBatch, Table, arrow_dplyr_query, or data.frame. If an arrow_dplyr_query or grouped_df, schema and partitioning will be taken from the result of any select() and group_by() operations done on the dataset. filter() queries will be applied to restrict written rows. Note that select()-ed columns may not be renamed. string path, URI, or SubTreeFileSystem referencing a directory to write to (directory will be created if it does not exist) a string identifier of the file format. Default is to use "parquet" (see FileFormat) Partitioning or a character vector of columns to use as partition keys (to be written as path segments). Default is to use the current group_by() columns. string template for the names of files to be written. Must contain "{i}", which will be replaced with an autoincremented integer to generate basenames of datafiles. For example, "part-{i}.feather" will yield "part-0.feather", .... logical: write partition segments as Hive-style (key1=value1/key2=value2/file.ext) or as just bare values. Default is TRUE. additional format-specific arguments. For available Parquet options, see write_parquet(). The available Feather options are use_legacy_format logical: write data formatted so that Arrow libraries versions 0.14 and lower can read it. Default is FALSE. You can also enable this by setting the environment variable ARROW_PRE_0_15_IPC_FORMAT=1. metadata_version: A string like "V5" or the equivalent integer indicating the Arrow IPC MetadataVersion. Default (NULL) will use the latest version, unless the environment variable ARROW_PRE_1_0_METADATA_VERSION=1, in which case it will be V4. codec: A Codec which will be used to compress body buffers of written files. Default (NULL) will not compress body buffers. null_fallback: character to be used in place of missing values (NA or NULL) when using Hive-style partitioning. See hive_partition().

## Value

The input dataset, invisibly