This release includes a
dplyr interface to Arrow Datasets, which let you work efficiently with large, multi-file datasets as a single entity. Explore a directory of data files with
open_dataset() and then use
dplyr methods to
filter(), etc. Work will be done where possible in Arrow memory. When necessary, data is pulled into R for further computation.
dplyr methods are conditionally loaded if you have
dplyr available; it is not a hard dependency.
vignette("dataset", package = "arrow") for details.
A source package installation (as from CRAN) will now handle its C++ dependencies automatically. For common Linux distributions and versions, installation will retrieve a prebuilt static C++ library for inclusion in the package; where this binary is not available, the package executes a bundled script that should build the Arrow C++ library with no system dependencies beyond what R requires.
vignette("install", package = "arrow") for details.
RecordBatches also have
[methods for Tables, RecordBatches, Arrays, and ChunkedArrays now support natural row extraction operations. These use the C++
Takemethods for efficient access, depending on the type of selection vector.
array_expressionclass has also been added, enabling among other things the ability to filter a Table with some function of Arrays, such as
arrow_table[arrow_table$var1 > 5, ]without having to pull everything into R first.
factorlevels are required to be) instead of raising an error
arrow::table()have been removed in favor of
Table$create(), eliminating the package startup message about masking
basefunctions. For more information, see the new
as_tibbleargument in the
read_*()functions has been renamed to
arrow::Columnclass has been removed, as it was removed from the C++ library
RecordBatchobjects have S3 methods that enable you to work with them more like
data.frames. Extract columns, subset, and so on. See
read_csv_arrow()supports more parsing options, including
read_feather()can ingest data from a
double()), and time types can be created with human-friendly resolution strings (“ms”, “s”, etc.). (ARROW-6338, ARROW-6364)
Initial CRAN release of the
arrow package. Key features include: