Skip to contents

Multi-file datasets

open_dataset()
Open a multi-file dataset
open_delim_dataset() open_csv_dataset() open_tsv_dataset()
Open a multi-file dataset of CSV or other delimiter-separated format
write_dataset()
Write a dataset
dataset_factory()
Create a DatasetFactory
hive_partition()
Construct Hive partitioning
Dataset FileSystemDataset UnionDataset InMemoryDataset DatasetFactory FileSystemDatasetFactory
Multi-file datasets
Partitioning DirectoryPartitioning HivePartitioning DirectoryPartitioningFactory HivePartitioningFactory
Define Partitioning for a Dataset
Expression
Arrow expressions
Scanner ScannerBuilder
Scan the contents of a dataset
FileFormat ParquetFileFormat IpcFileFormat
Dataset file formats
CsvFileFormat
CSV dataset file format
FileWriteOptions
Format-specific write options
FragmentScanOptions CsvFragmentScanOptions ParquetFragmentScanOptions
Format-specific scan options
map_batches()
Apply a function to a stream of RecordBatches

Reading and writing files

read_feather() read_ipc_file()
Read a Feather file (an Arrow IPC file)
read_ipc_stream()
Read Arrow IPC stream format
read_parquet()
Read a Parquet file
read_delim_arrow() read_csv_arrow() read_tsv_arrow()
Read a CSV or other delimited file with Arrow
read_json_arrow()
Read a JSON file
write_feather() write_ipc_file()
Write a Feather file (an Arrow IPC file)
write_ipc_stream()
Write Arrow IPC stream format
write_to_raw()
Write Arrow data to a raw vector
write_parquet()
Write Parquet file to disk
write_csv_arrow()
Write CSV file to disk

C++ reader/writer interface

Arrow data containers

array Array DictionaryArray StructArray ListArray LargeListArray FixedSizeListArray MapArray StructScalar
Arrow Arrays
chunked_array()
ChunkedArray class
Scalar
Arrow scalars
record_batch()
RecordBatch class
arrow_table()
Table class
ArrayData
ArrayData class
buffer()
Buffer class
read_message()
Read a Message from a stream
concat_arrays() c(<Array>)
Concatenate zero or more Arrays
concat_tables()
Concatenate one or more Tables
ExtensionArray
class arrow::ExtensionArray
vctrs_extension_array() vctrs_extension_type()
Extension type for generic typed vectors
as_arrow_array()
Convert an object to an Arrow Array
as_chunked_array()
Convert an object to an Arrow ChunkedArray
as_record_batch()
Convert an object to an Arrow RecordBatch
as_arrow_table()
Convert an object to an Arrow Table

Arrow data types and schema

schema()
Schema class
unify_schemas()
Combine and harmonize schemas
infer_type() type()
Infer the arrow Array type from an R object
dictionary()
Create a dictionary type
field()
Field class
read_schema()
read a Schema from a stream
int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() duration() null() timestamp() decimal() decimal128() decimal256() struct() list_of() large_list_of() fixed_size_list_of() map_of()
Apache Arrow data types
DataType
class arrow::DataType
DictionaryType
class DictionaryType
FixedWidthType
class arrow::FixedWidthType
new_extension_type() new_extension_array() register_extension_type() reregister_extension_type() unregister_extension_type()
Extension types
vctrs_extension_array() vctrs_extension_type()
Extension type for generic typed vectors
ExtensionType
class arrow::ExtensionType
as_data_type()
Convert an object to an Arrow DataType
as_schema()
Convert an object to an Arrow DataType

Flight

load_flight_server()
Load a Python Flight server
flight_connect()
Connect to a Flight server
flight_disconnect()
Explicitly close a Flight client
flight_get()
Get data from a Flight server
flight_put()
Send data to a Flight server
list_flights() flight_path_exists()
See available resources on a Flight server

File systems

s3_bucket()
Connect to an AWS S3 bucket
gs_bucket()
Connect to a Google Cloud Storage (GCS) bucket
FileSystem LocalFileSystem S3FileSystem GcsFileSystem SubTreeFileSystem
FileSystem classes
FileInfo
FileSystem entry info
FileSelector
file selector
copy_files()
Copy files between FileSystems

Input/Output

InputStream RandomAccessFile MemoryMappedFile ReadableFile BufferReader
InputStream classes
mmap_open()
Open a memory mapped file
mmap_create()
Create a new read/write memory mapped file of a given size
OutputStream FileOutputStream BufferOutputStream
OutputStream classes
Message
class arrow::Message
MessageReader
class arrow::MessageReader
compression CompressedOutputStream CompressedInputStream
Compressed stream classes
Codec
Compression Codec class
codec_is_available()
Check whether a compression codec is available

Computation

acero
Functions available in Arrow dplyr queries
call_function()
Call an Arrow compute function
match_arrow() is_in()
match and %in% for Arrow objects
value_counts()
table for Arrow objects
list_compute_functions()
List available Arrow C++ compute functions
register_scalar_function()
Register user-defined functions
show_exec_plan()
Show the details of an Arrow Execution Plan

Connections to other systems

to_arrow()
Create an Arrow object from others
to_duckdb()
Create a (virtual) DuckDB table from an Arrow object

Configuration

arrow_info() arrow_available() arrow_with_dataset() arrow_with_substrait() arrow_with_parquet() arrow_with_s3() arrow_with_gcs() arrow_with_json()
Report information on the package's capabilities
cpu_count() set_cpu_count()
Manage the global CPU thread pool in libarrow
io_thread_count() set_io_thread_count()
Manage the global I/O thread pool in libarrow
install_arrow()
Install or upgrade the Arrow library
install_pyarrow()
Install pyarrow for use with reticulate
create_package_with_all_dependencies()
Create a source bundle that includes all thirdparty dependencies