Function reference • Arrow R Package

Multi-file datasets

open_dataset(): Open a multi-file dataset

open_delim_dataset() open_csv_dataset() open_tsv_dataset(): Open a multi-file dataset of CSV or other delimiter-separated format

write_dataset(): Write a dataset

dataset_factory(): Create a DatasetFactory

hive_partition(): Construct Hive partitioning

Dataset FileSystemDataset UnionDataset InMemoryDataset DatasetFactory FileSystemDatasetFactory: Multi-file datasets

Partitioning DirectoryPartitioning HivePartitioning DirectoryPartitioningFactory HivePartitioningFactory: Define Partitioning for a Dataset

Expression: Arrow expressions

Scanner ScannerBuilder: Scan the contents of a dataset

FileFormat ParquetFileFormat IpcFileFormat: Dataset file formats

CsvFileFormat: CSV dataset file format

FileWriteOptions: Format-specific write options

FragmentScanOptions CsvFragmentScanOptions ParquetFragmentScanOptions: Format-specific scan options

map_batches(): Apply a function to a stream of RecordBatches

Reading and writing files

read_feather() read_ipc_file(): Read a Feather file (an Arrow IPC file)

read_ipc_stream(): Read Arrow IPC stream format

read_parquet(): Read a Parquet file

read_delim_arrow() read_csv_arrow() read_tsv_arrow(): Read a CSV or other delimited file with Arrow

read_json_arrow(): Read a JSON file

write_feather() write_ipc_file(): Write a Feather file (an Arrow IPC file)

write_ipc_stream(): Write Arrow IPC stream format

write_to_raw(): Write Arrow data to a raw vector

write_parquet(): Write Parquet file to disk

write_csv_arrow(): Write CSV file to disk

C++ reader/writer interface

ParquetFileReader: ParquetFileReader class

ParquetArrowReaderProperties: ParquetArrowReaderProperties class

ParquetFileWriter: ParquetFileWriter class

ParquetWriterProperties: ParquetWriterProperties class

FeatherReader: FeatherReader class

CsvTableReader JsonTableReader: Arrow CSV and JSON table reader classes

RecordBatchReader RecordBatchStreamReader RecordBatchFileReader: RecordBatchReader classes

RecordBatchWriter RecordBatchStreamWriter RecordBatchFileWriter: RecordBatchWriter classes

CsvReadOptions CsvWriteOptions CsvParseOptions TimestampParser CsvConvertOptions JsonReadOptions JsonParseOptions: File reader options

as_record_batch_reader as_record_batch_reader.RecordBatchReader as_record_batch_reader.Table as_record_batch_reader.RecordBatch as_record_batch_reader.data.frame as_record_batch_reader.Dataset as_record_batch_reader.function as_record_batch_reader.arrow_dplyr_query as_record_batch_reader.Scanner: Convert an object to an Arrow RecordBatchReader

Arrow data containers

array Array DictionaryArray StructArray ListArray LargeListArray FixedSizeListArray MapArray StructScalar: Arrow Arrays

chunked_array(): ChunkedArray class

Scalar: Arrow scalars

record_batch(): RecordBatch class

arrow_table(): Table class

ArrayData: ArrayData class

buffer(): Buffer class

read_message(): Read a Message from a stream

concat_arrays() c(<Array>): Concatenate zero or more Arrays

concat_tables(): Concatenate one or more Tables

ExtensionArray: class arrow::ExtensionArray

vctrs_extension_array() vctrs_extension_type(): Extension type for generic typed vectors

as_arrow_array(): Convert an object to an Arrow Array

as_chunked_array(): Convert an object to an Arrow ChunkedArray

as_record_batch(): Convert an object to an Arrow RecordBatch

as_arrow_table(): Convert an object to an Arrow Table

Arrow data types and schema

schema(): Schema class

unify_schemas(): Combine and harmonize schemas

infer_type() type(): Infer the arrow Array type from an R object

dictionary(): Create a dictionary type

field(): Field class

read_schema(): read a Schema from a stream

int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() duration() null() timestamp() decimal() decimal128() decimal256() struct() list_of() large_list_of() fixed_size_list_of() map_of(): Apache Arrow data types

DataType: class arrow::DataType

DictionaryType: class DictionaryType

FixedWidthType: class arrow::FixedWidthType

new_extension_type() new_extension_array() register_extension_type() reregister_extension_type() unregister_extension_type(): Extension types

vctrs_extension_array() vctrs_extension_type(): Extension type for generic typed vectors

ExtensionType: class arrow::ExtensionType

as_data_type(): Convert an object to an Arrow DataType

as_schema(): Convert an object to an Arrow DataType

Flight

load_flight_server(): Load a Python Flight server

flight_connect(): Connect to a Flight server

flight_disconnect(): Explicitly close a Flight client

flight_get(): Get data from a Flight server

flight_put(): Send data to a Flight server

list_flights() flight_path_exists(): See available resources on a Flight server

File systems

s3_bucket(): Connect to an AWS S3 bucket

gs_bucket(): Connect to a Google Cloud Storage (GCS) bucket

FileSystem LocalFileSystem S3FileSystem GcsFileSystem SubTreeFileSystem: FileSystem classes

FileInfo: FileSystem entry info

FileSelector: file selector

copy_files(): Copy files between FileSystems

Input/Output

InputStream RandomAccessFile MemoryMappedFile ReadableFile BufferReader: InputStream classes

mmap_open(): Open a memory mapped file

mmap_create(): Create a new read/write memory mapped file of a given size

OutputStream FileOutputStream BufferOutputStream: OutputStream classes

Message: class arrow::Message

MessageReader: class arrow::MessageReader

compression CompressedOutputStream CompressedInputStream: Compressed stream classes

Codec: Compression Codec class

codec_is_available(): Check whether a compression codec is available

Computation

acero: Functions available in Arrow dplyr queries

call_function(): Call an Arrow compute function

match_arrow() is_in(): match and %in% for Arrow objects

value_counts(): table for Arrow objects

list_compute_functions(): List available Arrow C++ compute functions

register_scalar_function(): Register user-defined functions

show_exec_plan(): Show the details of an Arrow Execution Plan

Connections to other systems

to_arrow(): Create an Arrow object from others

to_duckdb(): Create a (virtual) DuckDB table from an Arrow object

Configuration

arrow_info() arrow_available() arrow_with_dataset() arrow_with_substrait() arrow_with_parquet() arrow_with_s3() arrow_with_gcs() arrow_with_json(): Report information on the package's capabilities

cpu_count() set_cpu_count(): Manage the global CPU thread pool in libarrow

io_thread_count() set_io_thread_count(): Manage the global I/O thread pool in libarrow

install_arrow(): Install or upgrade the Arrow library

install_pyarrow(): Install pyarrow for use with reticulate

create_package_with_all_dependencies(): Create a source bundle that includes all thirdparty dependencies