Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.
read_json_arrow(
file,
col_select = NULL,
as_data_frame = TRUE,
schema = NULL,
...
)A character file name or URI, raw vector, an Arrow input stream,
or a FileSystem with path (SubTreeFileSystem).
If a file name, a memory-mapped Arrow InputStream will be opened and
closed when finished; compression will be detected from the file extension
and handled automatically. If an input stream is provided, it will be left
open.
A character vector of column names to keep, as in the
"select" argument to data.table::fread(), or a
tidy selection specification
of columns, as used in dplyr::select().
Should the function return a data.frame (default) or
an Arrow Table?
Schema that describes the table.
Additional options passed to JsonTableReader$create()
A data.frame, or a Table if as_data_frame = FALSE.
If passed a path, will detect and handle compression from the file extension
(e.g. .json.gz).
If schema is not provided, Arrow data types are inferred from the data:
JSON null values convert to the null() type, but can fall back to any other type.
JSON booleans convert to boolean().
JSON numbers convert to int64(), falling back to float64() if a non-integer is encountered.
JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to timestamp(unit = "s"),
falling back to utf8() if a conversion error occurs.
JSON arrays convert to a list_of() type, and inference proceeds recursively on the JSON arrays' values.
Nested JSON objects convert to a struct() type, and inference proceeds recursively on the JSON objects' values.
When as_data_frame = FALSE, Arrow types are further converted to R types.
See vignette("arrow", package = "arrow") for details.
tf <- tempfile()
on.exit(unlink(tf))
writeLines('
{ "hello": 3.5, "world": false, "yo": "thing" }
{ "hello": 3.25, "world": null }
{ "hello": 0.0, "world": true, "yo": null }
', tf, useBytes = TRUE)
read_json_arrow(tf)
#> # A tibble: 3 x 3
#> hello world yo
#> <dbl> <lgl> <chr>
#> 1 3.5 FALSE thing
#> 2 3.25 NA NA
#> 3 0 TRUE NA