Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.
Arguments
- file
A character file name or URI,
rawvector, an Arrow input stream, or aFileSystemwith path (SubTreeFileSystem). If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open.- col_select
A character vector of column names to keep, as in the "select" argument to
data.table::fread(), or a tidy selection specification of columns, as used indplyr::select().- as_data_frame
Should the function return a
data.frame(default) or an Arrow Table?- schema
Schema that describes the table.
- ...
Additional options passed to
JsonTableReader$create()
Details
If passed a path, will detect and handle compression from the file extension
(e.g. .json.gz).
If schema is not provided, Arrow data types are inferred from the data:
JSON null values convert to the
null()type, but can fall back to any other type.JSON booleans convert to
boolean().JSON numbers convert to
int64(), falling back tofloat64()if a non-integer is encountered.JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to
timestamp(unit = "s"), falling back toutf8()if a conversion error occurs.JSON arrays convert to a
list_of()type, and inference proceeds recursively on the JSON arrays' values.Nested JSON objects convert to a
struct()type, and inference proceeds recursively on the JSON objects' values.
When as_data_frame = TRUE, Arrow types are further converted to R types.
Examples
tf <- tempfile()
on.exit(unlink(tf))
writeLines('
{ "hello": 3.5, "world": false, "yo": "thing" }
{ "hello": 3.25, "world": null }
{ "hello": 0.0, "world": true, "yo": null }
', tf, useBytes = TRUE)
read_json_arrow(tf)
#> # A tibble: 3 x 3
#> hello world yo
#> <dbl> <lgl> <chr>
#> 1 3.5 FALSE thing
#> 2 3.25 NA NA
#> 3 0 TRUE NA