Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.
read_json_arrow(
file,
col_select = NULL,
as_data_frame = TRUE,
schema = NULL,
...
)
A character file name or URI, raw
vector, an Arrow input stream,
or a FileSystem
with path (SubTreeFileSystem
).
If a file name, a memory-mapped Arrow InputStream will be opened and
closed when finished; compression will be detected from the file extension
and handled automatically. If an input stream is provided, it will be left
open.
A character vector of column names to keep, as in the
"select" argument to data.table::fread()
, or a
tidy selection specification
of columns, as used in dplyr::select()
.
Should the function return a data.frame
(default) or
an Arrow Table?
Schema that describes the table.
Additional options passed to JsonTableReader$create()
A data.frame
, or a Table if as_data_frame = FALSE
.
If passed a path, will detect and handle compression from the file extension
(e.g. .json.gz
).
If schema
is not provided, Arrow data types are inferred from the data:
JSON null values convert to the null()
type, but can fall back to any other type.
JSON booleans convert to boolean()
.
JSON numbers convert to int64()
, falling back to float64()
if a non-integer is encountered.
JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to timestamp(unit = "s")
,
falling back to utf8()
if a conversion error occurs.
JSON arrays convert to a list_of()
type, and inference proceeds recursively on the JSON arrays' values.
Nested JSON objects convert to a struct()
type, and inference proceeds recursively on the JSON objects' values.
When as_data_frame = FALSE
, Arrow types are further converted to R types.
See vignette("arrow", package = "arrow")
for details.
tf <- tempfile()
on.exit(unlink(tf))
writeLines('
{ "hello": 3.5, "world": false, "yo": "thing" }
{ "hello": 3.25, "world": null }
{ "hello": 0.0, "world": true, "yo": null }
', tf, useBytes = TRUE)
read_json_arrow(tf)
#> # A tibble: 3 x 3
#> hello world yo
#> <dbl> <lgl> <chr>
#> 1 3.5 FALSE thing
#> 2 3.25 NA NA
#> 3 0 TRUE NA