CsvReadOptions, CsvParseOptions, CsvConvertOptions,
JsonReadOptions, JsonParseOptions, and TimestampParser are containers for various
file reading options. See their usage in read_csv_arrow() and
read_json_arrow(), respectively.
Factory
The CsvReadOptions$create() and JsonReadOptions$create() factory methods
take the following arguments:
use_threadsWhether to use the global CPU thread poolblock_sizeBlock size we request from the IO layer; also determines the size of chunks when use_threads isTRUE. NB: ifFALSE, JSON input must end with an empty line.
CsvReadOptions$create() further accepts these additional arguments:
skip_rowsNumber of lines to skip before reading data (default 0).column_namesCharacter vector to supply column names. If length-0 (the default), the first non-skipped row will be parsed to generate column names, unlessautogenerate_column_namesisTRUE.autogenerate_column_namesLogical: generate column names instead of using the first non-skipped row (the default)? IfTRUE, column names will be "f0", "f1", ..., "fN".encodingThe file encoding. (default"UTF-8")skip_rows_after_namesNumber of lines to skip after the column names (default 0). This number can be larger than the number of rows in one block, and empty rows are counted. The order of application is as follows:skip_rowsis applied (if non-zero);column names are read (unless
column_namesis set);skip_rows_after_namesis applied (if non-zero).
CsvParseOptions$create() takes the following arguments:
delimiterField delimiting character (default",")quotingLogical: are strings quoted? (defaultTRUE)quote_charQuoting character, ifquotingisTRUEdouble_quoteLogical: are quotes inside values double-quoted? (defaultTRUE)escapingLogical: whether escaping is used (defaultFALSE)escape_charEscaping character, ifescapingisTRUEnewlines_in_valuesLogical: are values allowed to contain CR (0x0d) and LF (0x0a) characters? (defaultFALSE)ignore_empty_linesLogical: should empty lines be ignored (default) or generate a row of missing values (ifFALSE)?
JsonParseOptions$create() accepts only the newlines_in_values argument.
CsvConvertOptions$create() takes the following arguments:
check_utf8Logical: check UTF8 validity of string columns? (defaultTRUE)null_valuescharacter vector of recognized spellings for null values. Analogous to thena.stringsargument toread.csv()ornainreadr::read_csv().strings_can_be_nullLogical: can string / binary columns have null values? Similar to thequoted_naargument toreadr::read_csv(). (defaultFALSE)true_valuescharacter vector of recognized spellings forTRUEvaluesfalse_valuescharacter vector of recognized spellings forFALSEvaluescol_typesASchemaorNULLto infer typesauto_dict_encodeLogical: Whether to try to automatically dictionary-encode string / binary data (thinkstringsAsFactors). DefaultFALSE. This setting is ignored for non-inferred columns (those incol_types).auto_dict_max_cardinalityIfauto_dict_encode, string/binary columns are dictionary-encoded up to this number of unique values (default 50), after which it switches to regular encoding.include_columnsIf non-empty, indicates the names of columns from the CSV file that should be actually read and converted (in the vector's order).include_missing_columnsLogical: ifinclude_columnsis provided, should columns named in it but not found in the data be included as a column of typenull()? The default (FALSE) means that the reader will instead raise an error.timestamp_parsersUser-defined timestamp parsers. If more than one parser is specified, the CSV conversion logic will try parsing values starting from the beginning of this vector. Possible values are (a)NULL, the default, which uses the ISO-8601 parser; (b) a character vector of strptime parse strings; or (c) a list of TimestampParser objects.
TimestampParser$create() takes an optional format string argument.
See strptime() for example syntax.
The default is to use an ISO-8601 format parser.
The CsvWriteOptions$create() factory method takes the following arguments:
include_headerWhether to write an initial header line with column namesbatch_sizeMaximum number of rows processed at a time. Default is 1024.null_stringThe string to be written for null values. Must not contain quotation marks. Default is an empty string ("").