Skip to contents

Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.

Usage

read_json_arrow(
  file,
  col_select = NULL,
  as_data_frame = TRUE,
  schema = NULL,
  ...
)

Arguments

file

A character file name or URI, raw vector, an Arrow input stream, or a FileSystem with path (SubTreeFileSystem). If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open.

col_select

A character vector of column names to keep, as in the "select" argument to data.table::fread(), or a tidy selection specification of columns, as used in dplyr::select().

as_data_frame

Should the function return a data.frame (default) or an Arrow Table?

schema

Schema that describes the table.

...

Additional options passed to JsonTableReader$create()

Value

A data.frame, or a Table if as_data_frame = FALSE.

Details

If passed a path, will detect and handle compression from the file extension (e.g. .json.gz).

If schema is not provided, Arrow data types are inferred from the data:

  • JSON null values convert to the null() type, but can fall back to any other type.

  • JSON booleans convert to boolean().

  • JSON numbers convert to int64(), falling back to float64() if a non-integer is encountered.

  • JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to timestamp(unit = "s"), falling back to utf8() if a conversion error occurs.

  • JSON arrays convert to a list_of() type, and inference proceeds recursively on the JSON arrays' values.

  • Nested JSON objects convert to a struct() type, and inference proceeds recursively on the JSON objects' values.

When as_data_frame = TRUE, Arrow types are further converted to R types.

Examples

tf <- tempfile()
on.exit(unlink(tf))
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing" }
    { "hello": 3.25, "world": null }
    { "hello": 0.0, "world": true, "yo": null }
  ', tf, useBytes = TRUE)
read_json_arrow(tf)
#> # A tibble: 3 x 3
#>   hello world yo   
#>   <dbl> <lgl> <chr>
#> 1  3.5  FALSE thing
#> 2  3.25 NA    NA   
#> 3  0    TRUE  NA