Wrapper around JsonTableReader to read a newline-delimited JSON (ndjson) file into a data frame or Arrow Table.

read_json_arrow(
file,
col_select = NULL,
as_data_frame = TRUE,
schema = NULL,
...
)

## Arguments

file

A character file name or URI, raw vector, an Arrow input stream, or a FileSystem with path (SubTreeFileSystem). If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open.

col_select

A character vector of column names to keep, as in the "select" argument to data.table::fread(), or a tidy selection specification of columns, as used in dplyr::select().

as_data_frame

Should the function return a data.frame (default) or an Arrow Table?

schema

Schema that describes the table.

...

Additional options passed to JsonTableReader\$create()

## Value

A data.frame, or a Table if as_data_frame = FALSE.

## Details

If passed a path, will detect and handle compression from the file extension (e.g. .json.gz).

If schema is not provided, Arrow data types are inferred from the data:

• JSON null values convert to the null() type, but can fall back to any other type.

• JSON booleans convert to boolean().

• JSON numbers convert to int64(), falling back to float64() if a non-integer is encountered.

• JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to timestamp(unit = "s"), falling back to utf8() if a conversion error occurs.

• JSON arrays convert to a list_of() type, and inference proceeds recursively on the JSON arrays' values.

• Nested JSON objects convert to a struct() type, and inference proceeds recursively on the JSON objects' values.

When as_data_frame = FALSE, Arrow types are further converted to R types. See vignette("arrow", package = "arrow") for details.

## Examples

tf <- tempfile()