CsvReadOptions
, CsvParseOptions
, CsvConvertOptions
,
JsonReadOptions
, JsonParseOptions
, and TimestampParser
are containers for various
file reading options. See their usage in read_csv_arrow()
and
read_json_arrow()
, respectively.
The CsvReadOptions$create()
and JsonReadOptions$create()
factory methods
take the following arguments:
use_threads
Whether to use the global CPU thread pool
block_size
Block size we request from the IO layer; also determines
the size of chunks when use_threads is TRUE
. NB: if FALSE
, JSON input
must end with an empty line.
CsvReadOptions$create()
further accepts these additional arguments:
skip_rows
Number of lines to skip before reading data (default 0)
column_names
Character vector to supply column names. If length-0
(the default), the first non-skipped row will be parsed to generate column
names, unless autogenerate_column_names
is TRUE
.
autogenerate_column_names
Logical: generate column names instead of
using the first non-skipped row (the default)? If TRUE
, column names will
be "f0", "f1", ..., "fN".
encoding
The file encoding. (default "UTF-8"
)
CsvParseOptions$create()
takes the following arguments:
delimiter
Field delimiting character (default ","
)
quoting
Logical: are strings quoted? (default TRUE
)
quote_char
Quoting character, if quoting
is TRUE
double_quote
Logical: are quotes inside values double-quoted? (default TRUE
)
escaping
Logical: whether escaping is used (default FALSE
)
escape_char
Escaping character, if escaping
is TRUE
newlines_in_values
Logical: are values allowed to contain CR (0x0d
)
and LF (0x0a
) characters? (default FALSE
)
ignore_empty_lines
Logical: should empty lines be ignored (default) or
generate a row of missing values (if FALSE
)?
JsonParseOptions$create()
accepts only the newlines_in_values
argument.
CsvConvertOptions$create()
takes the following arguments:
check_utf8
Logical: check UTF8 validity of string columns? (default TRUE
)
null_values
character vector of recognized spellings for null values.
Analogous to the na.strings
argument to
read.csv()
or na
in readr::read_csv()
.
strings_can_be_null
Logical: can string / binary columns have
null values? Similar to the quoted_na
argument to readr::read_csv()
.
(default FALSE
)
true_values
character vector of recognized spellings for TRUE
values
false_values
character vector of recognized spellings for FALSE
values
col_types
A Schema
or NULL
to infer types
auto_dict_encode
Logical: Whether to try to automatically
dictionary-encode string / binary data (think stringsAsFactors
). Default FALSE
.
This setting is ignored for non-inferred columns (those in col_types
).
auto_dict_max_cardinality
If auto_dict_encode
, string/binary columns
are dictionary-encoded up to this number of unique values (default 50),
after which it switches to regular encoding.
include_columns
If non-empty, indicates the names of columns from the
CSV file that should be actually read and converted (in the vector's order).
include_missing_columns
Logical: if include_columns
is provided, should
columns named in it but not found in the data be included as a column of
type null()
? The default (FALSE
) means that the reader will instead
raise an error.
timestamp_parsers
User-defined timestamp parsers. If more than one
parser is specified, the CSV conversion logic will try parsing values
starting from the beginning of this vector. Possible values are
(a) NULL
, the default, which uses the ISO-8601 parser;
(b) a character vector of strptime parse strings; or
(c) a list of TimestampParser objects.
TimestampParser$create()
takes an optional format
string argument.
See strptime()
for example syntax.
The default is to use an ISO-8601 format parser.
The CsvWriteOptions$create()
factory method takes the following arguments:
include_header
Whether to write an initial header line with column names
batch_size
Maximum number of rows processed at a time. Default is 1024.
column_names
: from CsvReadOptions