CsvReadOptions
, CsvParseOptions
, CsvConvertOptions
,
JsonReadOptions
, JsonParseOptions
, and TimestampParser
are containers for various
file reading options. See their usage in read_csv_arrow()
and
read_json_arrow()
, respectively.
Factory
The CsvReadOptions$create()
and JsonReadOptions$create()
factory methods
take the following arguments:
use_threads
Whether to use the global CPU thread poolblock_size
Block size we request from the IO layer; also determines the size of chunks when use_threads isTRUE
. NB: ifFALSE
, JSON input must end with an empty line.
CsvReadOptions$create()
further accepts these additional arguments:
skip_rows
Number of lines to skip before reading data (default 0)column_names
Character vector to supply column names. If length-0 (the default), the first non-skipped row will be parsed to generate column names, unlessautogenerate_column_names
isTRUE
.autogenerate_column_names
Logical: generate column names instead of using the first non-skipped row (the default)? IfTRUE
, column names will be "f0", "f1", ..., "fN".encoding
The file encoding. (default"UTF-8"
)
CsvParseOptions$create()
takes the following arguments:
delimiter
Field delimiting character (default","
)quoting
Logical: are strings quoted? (defaultTRUE
)quote_char
Quoting character, ifquoting
isTRUE
double_quote
Logical: are quotes inside values double-quoted? (defaultTRUE
)escaping
Logical: whether escaping is used (defaultFALSE
)escape_char
Escaping character, ifescaping
isTRUE
newlines_in_values
Logical: are values allowed to contain CR (0x0d
) and LF (0x0a
) characters? (defaultFALSE
)ignore_empty_lines
Logical: should empty lines be ignored (default) or generate a row of missing values (ifFALSE
)?
JsonParseOptions$create()
accepts only the newlines_in_values
argument.
CsvConvertOptions$create()
takes the following arguments:
check_utf8
Logical: check UTF8 validity of string columns? (defaultTRUE
)null_values
character vector of recognized spellings for null values. Analogous to thena.strings
argument toread.csv()
orna
inreadr::read_csv()
.strings_can_be_null
Logical: can string / binary columns have null values? Similar to thequoted_na
argument toreadr::read_csv()
. (defaultFALSE
)true_values
character vector of recognized spellings forTRUE
valuesfalse_values
character vector of recognized spellings forFALSE
valuescol_types
ASchema
orNULL
to infer typesauto_dict_encode
Logical: Whether to try to automatically dictionary-encode string / binary data (thinkstringsAsFactors
). DefaultFALSE
. This setting is ignored for non-inferred columns (those incol_types
).auto_dict_max_cardinality
Ifauto_dict_encode
, string/binary columns are dictionary-encoded up to this number of unique values (default 50), after which it switches to regular encoding.include_columns
If non-empty, indicates the names of columns from the CSV file that should be actually read and converted (in the vector's order).include_missing_columns
Logical: ifinclude_columns
is provided, should columns named in it but not found in the data be included as a column of typenull()
? The default (FALSE
) means that the reader will instead raise an error.timestamp_parsers
User-defined timestamp parsers. If more than one parser is specified, the CSV conversion logic will try parsing values starting from the beginning of this vector. Possible values are (a)NULL
, the default, which uses the ISO-8601 parser; (b) a character vector of strptime parse strings; or (c) a list of TimestampParser objects.
TimestampParser$create()
takes an optional format
string argument.
See strptime()
for example syntax.
The default is to use an ISO-8601 format parser.
The CsvWriteOptions$create()
factory method takes the following arguments:
include_header
Whether to write an initial header line with column namesbatch_size
Maximum number of rows processed at a time. Default is 1024.null_string
The string to be written for null values. Must not contain quotation marks. Default is an empty string (""
).