Function reference • Arrow R Package

Read datasets

Open multi-file datasets as Arrow Dataset objects.

open_dataset(): Open a multi-file dataset

open_delim_dataset() open_csv_dataset() open_tsv_dataset(): Open a multi-file dataset of CSV or other delimiter-separated format

csv_read_options(): CSV Reading Options

csv_parse_options(): CSV Parsing Options

csv_convert_options(): CSV Convert Options

Write datasets

Write multi-file datasets to disk.

write_dataset(): Write a dataset

write_delim_dataset() write_csv_dataset() write_tsv_dataset(): Write a dataset into partitioned flat files.

csv_write_options(): CSV Writing Options

Read files

Read files in a variety of formats in as tibbles or Arrow Tables.

read_delim_arrow() read_csv_arrow() read_csv2_arrow() read_tsv_arrow(): Read a CSV or other delimited file with Arrow

read_parquet(): Read a Parquet file

read_feather() read_ipc_file(): Read a Feather file (an Arrow IPC file)

read_ipc_stream(): Read Arrow IPC stream format

read_json_arrow(): Read a JSON file

Write files

Write to files in a variety of formats.

write_csv_arrow(): Write CSV file to disk

write_parquet(): Write Parquet file to disk

write_feather() write_ipc_file(): Write a Feather file (an Arrow IPC file)

write_ipc_stream(): Write Arrow IPC stream format

write_to_raw(): Write Arrow data to a raw vector

Creating Arrow data containers

Classes and functions for creating Arrow data containers.

scalar(): Create an Arrow Scalar

arrow_array(): Create an Arrow Array

chunked_array(): Create a Chunked Array

record_batch(): Create a RecordBatch

arrow_table(): Create an Arrow Table

buffer(): Create a Buffer

vctrs_extension_array() vctrs_extension_type(): Extension type for generic typed vectors

Working with Arrow data containers

Functions for converting R objects to Arrow data containers and combining Arrow data containers.

as_arrow_array(): Convert an object to an Arrow Array

as_chunked_array(): Convert an object to an Arrow ChunkedArray

as_record_batch(): Convert an object to an Arrow RecordBatch

as_arrow_table(): Convert an object to an Arrow Table

concat_arrays() c(<Array>): Concatenate zero or more Arrays

concat_tables(): Concatenate one or more Tables

Arrow data types

int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() duration() null() timestamp() decimal() decimal128() decimal256() struct() list_of() large_list_of() fixed_size_list_of() map_of(): Create Arrow data types

dictionary(): Create a dictionary type

new_extension_type() new_extension_array() register_extension_type() reregister_extension_type() unregister_extension_type(): Extension types

vctrs_extension_array() vctrs_extension_type(): Extension type for generic typed vectors

as_data_type(): Convert an object to an Arrow DataType

infer_type() type(): Infer the arrow Array type from an R object

Fields and schemas

field(): Create a Field

schema(): Create a schema or extract one from an object.

unify_schemas(): Combine and harmonize schemas

as_schema(): Convert an object to an Arrow Schema

infer_schema(): Extract a schema from an object

read_schema(): Read a Schema from a stream

Computation

Functionality for computing values on Arrow data objects.

acero arrow-functions arrow-verbs arrow-dplyr: Functions available in Arrow dplyr queries

call_function(): Call an Arrow compute function

match_arrow() is_in(): Value matching for Arrow objects

value_counts(): table for Arrow objects

list_compute_functions(): List available Arrow C++ compute functions

register_scalar_function(): Register user-defined functions

show_exec_plan(): Show the details of an Arrow Execution Plan

DuckDB

Pass data to and from DuckDB

to_arrow(): Create an Arrow object from a DuckDB connection

to_duckdb(): Create a (virtual) DuckDB table from an Arrow object

File systems

Functions for working with files on S3 and GCS

s3_bucket(): Connect to an AWS S3 bucket

gs_bucket(): Connect to a Google Cloud Storage (GCS) bucket

copy_files(): Copy files between FileSystems

Flight

load_flight_server(): Load a Python Flight server

flight_connect(): Connect to a Flight server

flight_disconnect(): Explicitly close a Flight client

flight_get(): Get data from a Flight server

flight_put(): Send data to a Flight server

list_flights() flight_path_exists(): See available resources on a Flight server

Arrow Configuration

arrow_info() arrow_available() arrow_with_acero() arrow_with_dataset() arrow_with_substrait() arrow_with_parquet() arrow_with_s3() arrow_with_gcs() arrow_with_json(): Report information on the package's capabilities

cpu_count() set_cpu_count(): Manage the global CPU thread pool in libarrow

io_thread_count() set_io_thread_count(): Manage the global I/O thread pool in libarrow

install_arrow(): Install or upgrade the Arrow library

install_pyarrow(): Install pyarrow for use with reticulate

create_package_with_all_dependencies(): Create a source bundle that includes all thirdparty dependencies

Input/Output

InputStream RandomAccessFile MemoryMappedFile ReadableFile BufferReader: InputStream classes

read_message(): Read a Message from a stream

mmap_open(): Open a memory mapped file

mmap_create(): Create a new read/write memory mapped file of a given size

OutputStream FileOutputStream BufferOutputStream: OutputStream classes

Message: Message class

MessageReader: MessageReader class

compression CompressedOutputStream CompressedInputStream: Compressed stream classes

Codec: Compression Codec class

codec_is_available(): Check whether a compression codec is available

File read/writer interface

ParquetFileReader: ParquetFileReader class

ParquetReaderProperties: ParquetReaderProperties class

ParquetArrowReaderProperties: ParquetArrowReaderProperties class

ParquetFileWriter: ParquetFileWriter class

ParquetWriterProperties: ParquetWriterProperties class

FeatherReader: FeatherReader class

CsvTableReader JsonTableReader: Arrow CSV and JSON table reader classes

CsvReadOptions CsvWriteOptions CsvParseOptions TimestampParser CsvConvertOptions JsonReadOptions JsonParseOptions: File reader options

RecordBatchReader RecordBatchStreamReader RecordBatchFileReader: RecordBatchReader classes

RecordBatchWriter RecordBatchStreamWriter RecordBatchFileWriter: RecordBatchWriter classes

as_record_batch_reader as_record_batch_reader.RecordBatchReader as_record_batch_reader.Table as_record_batch_reader.RecordBatch as_record_batch_reader.data.frame as_record_batch_reader.Dataset as_record_batch_reader.function as_record_batch_reader.arrow_dplyr_query as_record_batch_reader.Scanner: Convert an object to an Arrow RecordBatchReader

Low-level C++ wrappers

Low-level R6 class representations of Arrow C++ objects intended for advanced users.

Buffer: Buffer class

Scalar: Arrow scalars

Array DictionaryArray StructArray ListArray LargeListArray FixedSizeListArray MapArray: Array Classes

ChunkedArray: ChunkedArray class

RecordBatch: RecordBatch class

Schema: Schema class

Field: Field class

Table: Table class

DataType: DataType class

ArrayData: ArrayData class

DictionaryType: class DictionaryType

FixedWidthType: FixedWidthType class

ExtensionType: ExtensionType class

ExtensionArray: ExtensionArray class

Dataset and Filesystem R6 classes and helper functions

R6 classes and helper functions useful for when working with multi-file datases in Arrow.

Dataset FileSystemDataset UnionDataset InMemoryDataset DatasetFactory FileSystemDatasetFactory: Multi-file datasets

dataset_factory(): Create a DatasetFactory

Partitioning DirectoryPartitioning HivePartitioning DirectoryPartitioningFactory HivePartitioningFactory: Define Partitioning for a Dataset

Expression: Arrow expressions

Scanner ScannerBuilder: Scan the contents of a dataset

FileFormat ParquetFileFormat IpcFileFormat: Dataset file formats

CsvFileFormat: CSV dataset file format

JsonFileFormat: JSON dataset file format

FileWriteOptions: Format-specific write options

FragmentScanOptions CsvFragmentScanOptions ParquetFragmentScanOptions JsonFragmentScanOptions: Format-specific scan options

hive_partition(): Construct Hive partitioning

map_batches(): Apply a function to a stream of RecordBatches

FileSystem LocalFileSystem S3FileSystem GcsFileSystem SubTreeFileSystem: FileSystem classes

FileInfo: FileSystem entry info

FileSelector: file selector