Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.
Arguments
- file
A character file name or URI,
raw
vector, an Arrow input stream, or aFileSystem
with path (SubTreeFileSystem
). If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open.- col_select
A character vector of column names to keep, as in the "select" argument to
data.table::fread()
, or a tidy selection specification of columns, as used indplyr::select()
.- as_data_frame
Should the function return a
data.frame
(default) or an Arrow Table?- schema
Schema that describes the table.
- ...
Additional options passed to
JsonTableReader$create()
Details
If passed a path, will detect and handle compression from the file extension
(e.g. .json.gz
).
If schema
is not provided, Arrow data types are inferred from the data:
JSON null values convert to the
null()
type, but can fall back to any other type.JSON booleans convert to
boolean()
.JSON numbers convert to
int64()
, falling back tofloat64()
if a non-integer is encountered.JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to
timestamp(unit = "s")
, falling back toutf8()
if a conversion error occurs.JSON arrays convert to a
list_of()
type, and inference proceeds recursively on the JSON arrays' values.Nested JSON objects convert to a
struct()
type, and inference proceeds recursively on the JSON objects' values.
When as_data_frame = TRUE
, Arrow types are further converted to R types.
Examples
tf <- tempfile()
on.exit(unlink(tf))
writeLines('
{ "hello": 3.5, "world": false, "yo": "thing" }
{ "hello": 3.25, "world": null }
{ "hello": 0.0, "world": true, "yo": null }
', tf, useBytes = TRUE)
read_json_arrow(tf)
#> # A tibble: 3 x 3
#> hello world yo
#> <dbl> <lgl> <chr>
#> 1 3.5 FALSE thing
#> 2 3.25 NA NA
#> 3 0 TRUE NA