Parquet is a columnar storage file format. This function enables you to write Parquet files from R.

write_parquet(x, sink, chunk_size = NULL, version = NULL,
  compression = NULL, compression_level = NULL,
  use_dictionary = NULL, write_statistics = NULL,
  data_page_size = NULL, properties = ParquetWriterProperties$create(x,
  version = version, compression = compression, compression_level =
  compression_level, use_dictionary = use_dictionary, write_statistics =
  write_statistics, data_page_size = data_page_size),
  use_deprecated_int96_timestamps = FALSE, coerce_timestamps = NULL,
  allow_truncated_timestamps = FALSE,
  arrow_properties = ParquetArrowWriterProperties$create(use_deprecated_int96_timestamps
  = use_deprecated_int96_timestamps, coerce_timestamps = coerce_timestamps,
  allow_truncated_timestamps = allow_truncated_timestamps))

Arguments

x

An arrow::Table, or an object convertible to it.

sink

an arrow::io::OutputStream or a string which is interpreted as a file path

chunk_size

chunk size in number of rows. If NULL, the total number of rows is used.

version

parquet version, "1.0" or "2.0".

compression

compression algorithm. No compression by default.

compression_level

compression level.

use_dictionary

Specify if we should use dictionary encoding.

write_statistics

Specify if we should write statistics

data_page_size

Set a target threshhold for the approximate encoded size of data pages within a column chunk. If omitted, the default data page size (1Mb) is used.

properties

properties for parquet writer, derived from arguments version, compression, compression_level, use_dictionary, write_statistics and data_page_size

use_deprecated_int96_timestamps

Write timestamps to INT96 Parquet format

coerce_timestamps

Cast timestamps a particular resolution. can be NULL, "ms" or "us"

allow_truncated_timestamps

Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to ms', do not raise an exception

arrow_properties

arrow specific writer properties, derived from arguments use_deprecated_int96_timestamps, coerce_timestamps and allow_truncated_timestamps

Value

NULL, invisibly

Details

The parameters compression, compression_level, use_dictionary and write_statistics support various patterns: - The default NULL leaves the parameter unspecified, and the C++ library uses an appropriate default for each column - A single, unnamed, value (e.g. a single string for compression) applies to all columns - An unnamed vector, of the same size as the number of columns, to specify a value for each column, in positional order - A named vector, to specify the value for the named columns, the default value for the setting is used when not supplied.

Examples

# \donttest{ tf1 <- tempfile(fileext = ".parquet") write_parquet(data.frame(x = 1:5), tf2)
#> Error in write_parquet(data.frame(x = 1:5), tf2): object 'tf2' not found
# using compression tf2 <- tempfile(fileext = ".gz.parquet") write_parquet(data.frame(x = 1:5), compression = "gzip", compression_level = 5)
#> Error in write_parquet(data.frame(x = 1:5), compression = "gzip", compression_level = 5): argument "sink" is missing, with no default
# }