## arrow 11.0.0.3

CRAN release: 2023-03-08

### Minor improvements and fixes

• open_csv_dataset() allows a schema to be specified. (#34217)
• To ensure compatibility with an upcoming dplyr release, we no longer call dplyr:::check_names() (#34369)

## arrow 11.0.0.2

CRAN release: 2023-02-12

### Breaking changes

• map_batches() is lazy by default; it now returns a RecordBatchReader instead of a list of RecordBatch objects unless lazy = FALSE. (#14521)

### New features

#### Docs

• A substantial reorganisation, rewrite of and addition to, many of the vignettes and README. (@djnavarro, #14514)

• New functions open_csv_dataset(), open_tsv_dataset(), and open_delim_dataset() all wrap open_dataset()- they don’t provide new functionality, but allow for readr-style options to be supplied, making it simpler to switch between individual file-reading and dataset functionality. (#33614)
• User-defined null values can be set when writing CSVs both as datasets and as individual files. (@wjones127, #14679)
• The new col_names parameter allows specification of column names when opening a CSV dataset. (@wjones127, #14705)
• The parse_options, read_options, and convert_options parameters for reading individual files (read_*_arrow() functions) and datasets (open_dataset() and the new open_*_dataset() functions) can be passed in as lists. (#15270)
• File paths containing accents can be read by read_csv_arrow(). (#14930)

#### dplyr compatibility

• New dplyr (1.1.0) function join_by() has been implemented for dplyr joins on Arrow objects (equality conditions only). (#33664)
• Output is accurate when multiple dplyr::group_by()/dplyr::summarise() calls are used. (#14905)
• dplyr::summarize() works with division when divisor is a variable. (#14933)
• dplyr::right_join() correctly coalesces keys. (#15077)
• Multiple changes to ensure compatibility with dplyr 1.1.0. (@lionel-, #14948)

#### Function bindings

• The following functions can be used in queries on Arrow objects:
• lubridate::with_tz() and lubridate::force_tz() (@eitsupi, #14093)
• stringr::str_remove() and stringr::str_remove_all() (#14644)

#### Arrow object creation

• Arrow Scalars can be created from POSIXlt objects. (#15277)
• Array$create() can create Decimal arrays. (#15211) • StructArray$create() can be used to create StructArray objects. (#14922)
• Creating an Array from an object bigger than 2^31 has correct length (#14929)

#### Installation

• The package can automatically link to system installations of the AWS SDK for C++. (@kou, #14235)

### Minor improvements and fixes

• Calling lubridate::as_datetime() on Arrow objects can handle time in sub-seconds. (@eitsupi, #13890)
• head() can be called after as_record_batch_reader(). (#14518)
• as.Date() can go from timestamp[us] to timestamp[s]. (#14935)
• curl timeout policy can be configured for S3. (#15166)
• rlang dependency must be at least version 1.0.0 because of check_dots_empty(). (@daattali, #14744)

## arrow 10.0.1

CRAN release: 2022-12-06

Minor improvements and fixes:

• Fixes for failing test after lubridate 1.9 release (#14615)
• Update to ensure compatibility with changes in dev purrr (#14581)
• Fix to correctly handle .data pronoun in dplyr::group_by() (#14484)

## arrow 10.0.0

CRAN release: 2022-10-26

### Arrow dplyr queries

Several new functions can be used in queries:

• dplyr::across() can be used to apply the same computation across multiple columns, and the where() selection helper is supported in across();
• add_filename() can be used to get the filename a row came from (only available when querying ?Dataset);
• Added five functions in the slice_* family: dplyr::slice_min(), dplyr::slice_max(), dplyr::slice_head(), dplyr::slice_tail(), and dplyr::slice_sample().

The package now has documentation that lists all dplyr methods and R function mappings that are supported on Arrow data, along with notes about any differences in functionality between queries evaluated in R versus in Acero, the Arrow query engine. See ?acero.

A few new features and bugfixes were implemented for joins:

• Extension arrays are now supported in joins, allowing, for example, joining datasets that contain geoarrow data.
• The keep argument is now supported, allowing separate columns for the left and right hand side join keys in join output. Full joins now coalesce the join keys (when keep = FALSE), avoiding the issue where the join keys would be all NA for rows in the right hand side without any matches on the left.

Some changes to improve the consistency of the API:

• In a future release, calling dplyr::pull() will return a ?ChunkedArray instead of an R vector by default. The current default behavior is deprecated. To update to the new behavior now, specify pull(as_vector = FALSE) or set options(arrow.pull_as_vector = FALSE) globally.
• Calling dplyr::compute() on a query that is grouped returns a ?Table instead of a query object.

Finally, long-running queries can now be cancelled and will abort their computation immediately.

### Arrays and tables

as_arrow_array() can now take blob::blob and ?vctrs::list_of, which convert to binary and list arrays, respectively. Also fixed an issue where as_arrow_array() ignored type argument when passed a StructArray.

The unique() function works on ?Table, ?RecordBatch, ?Dataset, and ?RecordBatchReader.

write_feather() can take compression = FALSE to choose writing uncompressed files.

Also, a breaking change for IPC files in write_dataset(): passing "ipc" or "feather" to format will now write files with .arrow extension instead of .ipc or .feather.

### Installation

As of version 10.0.0, arrow requires C++17 to build. This means that:

• On Windows, you need R >= 4.0. Version 9.0.0 was the last version to support R 3.6.
• On CentOS 7, you can build the latest version of arrow, but you first need to install a newer compiler than the default system compiler, gcc 4.8. See vignette("install", package = "arrow") for guidance. Note that you only need the newer compiler to build arrow: installing a binary package, as from RStudio Package Manager, or loading a package you’ve already installed works fine with the system defaults.

## arrow 9.0.0

CRAN release: 2022-08-10

### Arrow dplyr queries

• New dplyr verbs:
• dplyr::union and dplyr::union_all (#13090)
• dplyr::glimpse (#13563)
• show_exec_plan() can be added to the end of a dplyr pipeline to show the underlying plan, similar to dplyr::show_query(). dplyr::show_query() and dplyr::explain() also work and show the same output, but may change in the future. (#13541)
• User-defined functions are supported in queries. Use register_scalar_function() to create them. (#13397)
• map_batches() returns a RecordBatchReader and requires that the function it maps returns something coercible to a RecordBatch through the as_record_batch() S3 function. It can also run in streaming fashion if passed .lazy = TRUE. (#13170, #13650)
• Functions can be called with package namespace prefixes (e.g. stringr::, lubridate::) within queries. For example, stringr::str_length will now dispatch to the same kernel as str_length. (#13160)
• Support for new functions:
• lubridate::parse_date_time() datetime parser: (#12589, #13196, #13506)
• orders with year, month, day, hours, minutes, and seconds components are supported.
• the orders argument in the Arrow binding works as follows: orders are transformed into formats which subsequently get applied in turn. There is no select_formats parameter and no inference takes place (like is the case in lubridate::parse_date_time()).
• lubridate date and datetime parsers such as lubridate::ymd(), lubridate::yq(), and lubridate::ymd_hms() (#13118, #13163, #13627)
• lubridate::fast_strptime() (#13174)
• lubridate::floor_date(), lubridate::ceiling_date(), and lubridate::round_date() (#12154)
• strptime() supports the tz argument to pass timezones. (#13190)
• lubridate::qday() (day of quarter)
• exp() and sqrt(). (#13517)
• Bugfixes:
• Count distinct now gives correct result across multiple row groups. (#13583)
• Aggregations over partition columns return correct results. (#13518)

• New functions read_ipc_file() and write_ipc_file() are added. These functions are almost the same as read_feather() and write_feather(), but differ in that they only target IPC files (Feather V2 files), not Feather V1 files.
• read_arrow() and write_arrow(), deprecated since 1.0.0 (July 2020), have been removed. Instead of these, use the read_ipc_file() and write_ipc_file() for IPC files, or, read_ipc_stream() and write_ipc_stream() for IPC streams. (#13550)
• write_parquet() now defaults to writing Parquet format version 2.4 (was 1.0). Previously deprecated arguments properties and arrow_properties have been removed; if you need to deal with these lower-level properties objects directly, use ParquetFileWriter, which write_parquet() wraps. (#13555)
• UnionDatasets can unify schemas of multiple InMemoryDatasets with varying schemas. (#13088)
• write_dataset() preserves all schema metadata again. In 8.0.0, it would drop most metadata, breaking packages such as sfarrow. (#13105)
• Reading and writing functions (such as write_csv_arrow()) will automatically (de-)compress data if the file path contains a compression extension (e.g. "data.csv.gz"). This works locally as well as on remote filesystems like S3 and GCS. (#13183)
• FileSystemFactoryOptions can be provided to open_dataset(), allowing you to pass options such as which file prefixes to ignore. (#13171)
• By default, S3FileSystem will not create or delete buckets. To enable that, pass the configuration option allow_bucket_creation or allow_bucket_deletion. (#13206)
• GcsFileSystem and gs_bucket() allow connecting to Google Cloud Storage. (#10999, #13601)

• Table and RecordBatch $num_rows() method returns a double (previously integer), avoiding integer overflow on larger tables. (#13482, #13514) ### Packaging • The arrow.dev_repo for nightly builds of the R package and prebuilt libarrow binaries is now https://nightlies.apache.org/arrow/r/. • Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with Windows binaries. (#13484) ## arrow 8.0.0 CRAN release: 2022-05-09 ### Enhancements to dplyr and datasets • open_dataset(): • correctly supports the skip argument for skipping header rows in CSV datasets. • can take a list of datasets with differing schemas and attempt to unify the schemas to produce a UnionDataset. • Arrow dplyr queries: • are supported on RecordBatchReader. This allows, for example, results from DuckDB to be streamed back into Arrow rather than materialized before continuing the pipeline. • no longer need to materialize the entire result table before writing to a dataset if the query contains aggregations or joins. • supports dplyr::rename_with(). • dplyr::count() returns an ungrouped dataframe. • write_dataset() has more options for controlling row group and file sizes when writing partitioned datasets, such as max_open_files, max_rows_per_file, min_rows_per_group, and max_rows_per_group. • write_csv_arrow() accepts a Dataset or an Arrow dplyr query. • Joining one or more datasets while option(use_threads = FALSE) no longer crashes R. That option is set by default on Windows. • dplyr joins support the suffix argument to handle overlap in column names. • Filtering a Parquet dataset with is.na() no longer misses any rows. • map_batches() correctly accepts Dataset objects. ### Enhancements to date and time support • read_csv_arrow()’s readr-style type T is mapped to timestamp(unit = "ns") instead of timestamp(unit = "s"). • For Arrow dplyr queries, added additional lubridate features and fixes: • New component extraction functions: • lubridate::tz() (timezone), • lubridate::semester(), • lubridate::dst() (daylight savings time boolean), • lubridate::date(), • lubridate::epiyear() (year according to epidemiological week calendar), • lubridate::month() works with integer inputs. • lubridate::make_date() & lubridate::make_datetime() + base::ISOdatetime() & base::ISOdate() to create date-times from numeric representations. • lubridate::decimal_date() and lubridate::date_decimal() • lubridate::make_difftime() (duration constructor) • ?lubridate::duration helper functions, such as lubridate::dyears(), lubridate::dhours(), lubridate::dseconds(). • lubridate::leap_year() • lubridate::as_date() and lubridate::as_datetime() • Also for Arrow dplyr queries, added support and fixes for base date and time functions: • base::difftime and base::as.difftime() • base::as.Date() to convert to date • Arrow timestamp and date arrays support base::format() • strptime() returns NA instead of erroring in case of format mismatch, just like base::strptime(). • Timezone operations are supported on Windows if the tzdb package is also installed. ### Extensibility • Added S3 generic conversion functions such as as_arrow_array() and as_arrow_table() for main Arrow objects. This includes, Arrow tables, record batches, arrays, chunked arrays, record batch readers, schemas, and data types. This allows other packages to define custom conversions from their types to Arrow objects, including extension arrays. • Custom extension types and arrays can be created and registered, allowing other packages to define their own array types. Extension arrays wrap regular Arrow array types and provide customized behavior and/or storage. See description and an example with ?new_extension_type. • Implemented a generic extension type and as_arrow_array() methods for all objects where vctrs::vec_is() returns TRUE (i.e., any object that can be used as a column in a tibble::tibble()), provided that the underlying vctrs::vec_data() can be converted to an Arrow Array. ### Concatenation Support Arrow arrays and tables can be easily concatenated: • Arrays can be concatenated with concat_arrays() or, if zero-copy is desired and chunking is acceptable, using ChunkedArray$create().
• ChunkedArrays can be concatenated with c().
• RecordBatches and Tables support cbind().
• Tables support rbind(). concat_tables() is also provided to concatenate tables while unifying schemas.

### Other improvements and fixes

• Dictionary arrays support using ALTREP when converting to R factors.
• Math group generics are implemented for ArrowDatum. This means you can use base functions like sqrt(), log(), and exp() with Arrow arrays and scalars.
• read_* and write_* functions support R Connection objects for reading and writing files.
• Parquet improvements:
• Parquet writer supports Duration type columns.
• The dataset Parquet reader consumes less memory.
• median() and quantile() will warn only once about approximate calculations regardless of interactivity.
• Array$cast() can cast StructArrays into another struct type with the same field names and structure (or a subset of fields) but different field types. • Removed special handling for Solaris. • The CSV writer is much faster when writing string columns. • Fixed an issue where set_io_thread_count() would set the CPU count instead of the IO thread count. • RandomAccessFile has a $ReadMetadata() method that provides useful metadata provided by the filesystem.
• grepl binding returns FALSE for NA inputs (previously it returned NA), to match the behavior of base::grepl().
• create_package_with_all_dependencies() works on Windows and Mac OS, instead of only Linux.

## arrow 7.0.0

CRAN release: 2022-02-10

### Enhancements to dplyr and datasets

• Additional lubridate features: week(), more of the is.*() functions, and the label argument to month() have been implemented.
• More complex expressions inside summarize(), such as ifelse(n() > 1, mean(y), mean(z)), are supported.
• When adding columns in a dplyr pipeline, one can now use tibble and data.frame to create columns of tibbles or data.frames respectively (e.g. ... %>% mutate(df_col = tibble(a, b)) %>% ...).
• Dictionary columns (R factor type) are supported inside of coalesce().
• open_dataset() accepts the partitioning argument when reading Hive-style partitioned files, even though it is not required.
• The experimental map_batches() function for custom operations on dataset has been restored.

### CSV

• Delimited files (including CSVs) with encodings other than UTF can now be read (using the encoding argument when reading).
• open_dataset() correctly ignores byte-order marks (BOMs) in CSVs, as already was true for reading single files
• Reading a dataset internally uses an asynchronous scanner by default, which resolves a potential deadlock when reading in large CSV datasets.
• head() no longer hangs on large CSV datasets.
• There is an improved error message when there is a conflict between a header in the file and schema/column names provided as arguments.
• write_csv_arrow() now follows the signature of readr::write_csv().

### Other improvements and fixes

• Many of the vignettes have been reorganized, restructured and expanded to improve their usefulness and clarity.

### Internals

• We now use testthat 3rd edition as our default
• A number of large test reorganizations
• Style changes to conform with the tidyverse style guide + using lintr

## arrow 5.0.0.2

CRAN release: 2021-09-05

This patch version contains fixes for some sanitizer and compiler warnings.

## arrow 5.0.0

CRAN release: 2021-07-29

### More dplyr

• There are now more than 250 compute functions available for use in dplyr::filter(), mutate(), etc. Additions in this release include:

• String operations: strsplit() and str_split(); strptime(); paste(), paste0(), and str_c(); substr() and str_sub(); str_like(); str_pad(); stri_reverse()
• Date/time operations: lubridate methods such as year(), month(), wday(), and so on
• Math: logarithms (log() et al.); trigonometry (sin(), cos(), et al.); abs(); sign(); pmin() and pmax(); ceiling(), floor(), and trunc()
• Conditional functions, with some limitations on input type in this release: ifelse() and if_else() for all but Decimal types; case_when() for logical, numeric, and temporal types only; coalesce() for all but lists/structs. Note also that in this release, factors/dictionaries are converted to strings in these functions.
• is.* functions are supported and can be used inside relocate()
• The print method for arrow_dplyr_query now includes the expression and the resulting type of columns derived by mutate().
• transmute() now errors if passed arguments .keep, .before, or .after, for consistency with the behavior of dplyr on data.frames.

### CSV writing

• write_csv_arrow() to use Arrow to write a data.frame to a single CSV file
• write_dataset(format = "csv", ...) to write a Dataset to CSVs, including with partitioning

• RecordBatch columns can now be added, replaced, or removed by assigning (<-) with either $ or [[ • Similarly, Schema can now be edited by assigning in new types. This enables using the CSV reader to detect the schema of a file, modify the Schema object for any columns that you want to read in as a different type, and then use that Schema to read the data. • Better validation when creating a Table with a schema, with columns of different lengths, and with scalar value recycling • Reading Parquet files in Japanese or other multi-byte locales on Windows no longer hangs (workaround for a bug in libstdc++; thanks @yutannihilation for the persistence in discovering this!) • If you attempt to read string data that has embedded nul (\0) characters, the error message now informs you that you can set options(arrow.skip_nul = TRUE) to strip them out. It is not recommended to set this option by default since this code path is significantly slower, and most string data does not contain nuls. • read_json_arrow() now accepts a schema: read_json_arrow("file.json", schema = schema(col_a = float64(), col_b = string())) ### Installation and configuration • The R package can now support working with an Arrow C++ library that has additional features (such as dataset, parquet, string libraries) disabled, and the bundled build script enables setting environment variables to disable them. See vignette("install", package = "arrow") for details. This allows a faster, smaller package build in cases where that is useful, and it enables a minimal, functioning R package build on Solaris. • On macOS, it is now possible to use the same bundled C++ build that is used by default on Linux, along with all of its customization parameters, by setting the environment variable FORCE_BUNDLED_BUILD=true. • arrow now uses the mimalloc memory allocator by default on macOS, if available (as it is in CRAN binaries), instead of jemalloc. There are configuration issues with jemalloc on macOS, and benchmark analysis shows that this has negative effects on performance, especially on memory-intensive workflows. jemalloc remains the default on Linux; mimalloc is default on Windows. • Setting the ARROW_DEFAULT_MEMORY_POOL environment variable to switch memory allocators now works correctly when the Arrow C++ library has been statically linked (as is usually the case when installing from CRAN). • The arrow_info() function now reports on the additional optional features, as well as the detected SIMD level. If key features or compression libraries are not enabled in the build, arrow_info() will refer to the installation vignette for guidance on how to install a more complete build, if desired. • If you attempt to read a file that was compressed with a codec that your Arrow build does not contain support for, the error message now will tell you how to reinstall Arrow with that feature enabled. • A new vignette about developer environment setup vignette("developing", package = "arrow"). • When building from source, you can use the environment variable ARROW_HOME to point to a specific directory where the Arrow libraries are. This is similar to passing INCLUDE_DIR and LIB_DIR. ## arrow 3.0.0 CRAN release: 2021-01-27 ### Python and Flight • Flight methods flight_get() and flight_put() (renamed from push_data() in this release) can handle both Tables and RecordBatches • flight_put() gains an overwrite argument to optionally check for the existence of a resource with the same name • list_flights() and flight_path_exists() enable you to see available resources on a Flight server • Schema objects now have r_to_py and py_to_r methods • Schema metadata is correctly preserved when converting Tables to/from Python ### Enhancements • Arithmetic operations (+, *, etc.) are supported on Arrays and ChunkedArrays and can be used in filter expressions in Arrow dplyr pipelines • Table columns can now be added, replaced, or removed by assigning (<-) with either $ or [[
• Column names of Tables and RecordBatches can be renamed by assigning names()
• Large string types can now be written to Parquet files
• The rlang pronouns .data and .env are now fully supported in Arrow dplyr pipelines.
• Option arrow.skip_nul (default FALSE, as in base::scan()) allows conversion of Arrow string (utf8()) type data containing embedded nul \0 characters to R. If set to TRUE, nuls will be stripped and a warning is emitted if any are found.
• arrow_info() for an overview of various run-time and build-time Arrow configurations, useful for debugging
• Set environment variable ARROW_DEFAULT_MEMORY_POOL before loading the Arrow package to change memory allocators. Windows packages are built with mimalloc; most others are built with both jemalloc (used by default) and mimalloc. These alternative memory allocators are generally much faster than the system memory allocator, so they are used by default when available, but sometimes it is useful to turn them off for debugging purposes. To disable them, set ARROW_DEFAULT_MEMORY_POOL=system.
• List columns that have attributes on each element are now also included with the metadata that is saved when creating Arrow tables. This allows sf tibbles to faithfully preserved and roundtripped (#8549).
• R metadata that exceeds 100Kb is now compressed before being written to a table; see schema() for more details.

### Bug fixes

• Fixed a performance regression in converting Arrow string types to R that was present in the 2.0.0 release
• C++ functions now trigger garbage collection when needed
• write_parquet() can now write RecordBatches
• Reading a Table from a RecordBatchStreamReader containing 0 batches no longer crashes
• readr’s problems attribute is removed when converting to Arrow RecordBatch and table to prevent large amounts of metadata from accumulating inadvertently (#9092)
• Fixed reading of compressed Feather files written with Arrow 0.17 (#9128)
• SubTreeFileSystem gains a useful print method and no longer errors when printing

### Packaging and installation

• Nightly development versions of the conda r-arrow package are available with conda install -c arrow-nightlies -c conda-forge --strict-channel-priority r-arrow
• Linux installation now safely supports older cmake versions
• Compiler version checking for enabling S3 support correctly identifies the active compiler
• Updated guidance and troubleshooting in vignette("install", package = "arrow"), especially for known CentOS issues
• Operating system detection on Linux uses the distro package. If your OS isn’t correctly identified, please report an issue there.

## arrow 2.0.0

CRAN release: 2020-10-20

### Datasets

• write_dataset() to Feather or Parquet files with partitioning. See the end of vignette("dataset", package = "arrow") for discussion and examples.
• Datasets now have head(), tail(), and take ([) methods. head() is optimized but the others may not be performant.
• collect() gains an as_data_frame argument, default TRUE but when FALSE allows you to evaluate the accumulated select and filter query but keep the result in Arrow, not an R data.frame
• read_csv_arrow() supports specifying column types, both with a Schema and with the compact string representation for types used in the readr package. It also has gained a timestamp_parsers argument that lets you express a set of strptime parse strings that will be tried to convert columns designated as Timestamp type.

### AWS S3 support

• S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R >= 4.0) packages. To enable it on Linux, you need the additional system dependencies libcurl and openssl, as well as a sufficiently modern compiler. See vignette("install", package = "arrow") for details.

### Compression

• write_parquet() now supports compression
• codec_is_available() returns TRUE or FALSE whether the Arrow C++ library was built with support for a given compression library (e.g. gzip, lz4, snappy)
• Windows builds now include support for zstd and lz4 compression (#5814, @gnguy)

### Other fixes and improvements

• Arrow null type is now supported
• Factor types are now preserved in round trip through Parquet format (#6135, @yutannihilation)
• Reading an Arrow dictionary type coerces dictionary values to character (as R factor levels are required to be) instead of raising an error
• Many improvements to Parquet function documentation (@karldw, @khughitt)

## arrow 0.15.1

CRAN release: 2019-11-04

• This patch release includes bugfixes in the C++ library around dictionary types and Parquet reading.

## arrow 0.15.0

CRAN release: 2019-10-07

### Breaking changes

• The R6 classes that wrap the C++ classes are now documented and exported and have been renamed to be more R-friendly. Users of the high-level R interface in this package are not affected. Those who want to interact with the Arrow C++ API more directly should work with these objects and methods. As part of this change, many functions that instantiated these R6 objects have been removed in favor of Class$create() methods. Notably, arrow::array() and arrow::table() have been removed in favor of Array$create() and Table\$create(), eliminating the package startup message about masking base functions. For more information, see the new vignette("arrow").
• Due to a subtle change in the Arrow message format, data written by the 0.15 version libraries may not be readable by older versions. If you need to send data to a process that uses an older version of Arrow (for example, an Apache Spark server that hasn’t yet updated to Arrow 0.15), you can set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1.
• The as_tibble argument in the read_*() functions has been renamed to as_data_frame (#5399, @jameslamb)
• The arrow::Column class has been removed, as it was removed from the C++ library

### New features

• Table and RecordBatch objects have S3 methods that enable you to work with them more like data.frames. Extract columns, subset, and so on. See ?Table and ?RecordBatch for examples.
• Initial implementation of bindings for the C++ File System API. (#5223)
• Compressed streams are now supported on Windows (#5329), and you can also specify a compression level (#5450)

• Parquet file reading is much, much faster, thanks to improvements in the Arrow C++ library.
• read_csv_arrow() supports more parsing options, including col_names, na, quoted_na, and skip
• read_parquet() and read_feather() can ingest data from a raw vector (#5141)
• File readers now properly handle paths that need expanding, such as ~/file.parquet (#5169)
• Improved support for creating types in a schema: the types’ printed names (e.g. “double”) are guaranteed to be valid to use in instantiating a schema (e.g. double()), and time types can be created with human-friendly resolution strings (“ms”, “s”, etc.). (#5198, #5201)

## arrow 0.14.1

CRAN release: 2019-08-05

Initial CRAN release of the arrow package. Key features include:

• Read and write support for various file formats, including Parquet, Feather/Arrow, CSV, and JSON.
• API bindings to the C++ library for Arrow data types and objects, as well as mapping between Arrow types and R data types.
• Tools for helping with C++ library configuration and installation.