Multi-file datasets

open_dataset()

Open a multi-file dataset

write_dataset()

Write a dataset

dataset_factory()

Create a DatasetFactory

hive_partition()

Construct Hive partitioning

Dataset FileSystemDataset UnionDataset InMemoryDataset DatasetFactory FileSystemDatasetFactory

Multi-file datasets

Partitioning DirectoryPartitioning HivePartitioning DirectoryPartitioningFactory HivePartitioningFactory

Define Partitioning for a Dataset

Expression

Arrow expressions

Scanner ScannerBuilder

Scan the contents of a dataset

FileFormat ParquetFileFormat IpcFileFormat CsvFileFormat

Dataset file formats

FileWriteOptions

Format-specific write options

FragmentScanOptions CsvFragmentScanOptions ParquetFragmentScanOptions

Format-specific scan options

map_batches()

Apply a function to a stream of RecordBatches

Reading and writing files

read_feather() read_ipc_file()

Read a Feather file (an Arrow IPC file)

read_ipc_stream()

Read Arrow IPC stream format

read_parquet()

Read a Parquet file

read_delim_arrow() read_csv_arrow() read_tsv_arrow()

Read a CSV or other delimited file with Arrow

read_json_arrow()

Read a JSON file

write_feather() write_ipc_file()

Write a Feather file (an Arrow IPC file)

write_ipc_stream()

Write Arrow IPC stream format

write_to_raw()

Write Arrow data to a raw vector

write_parquet()

Write Parquet file to disk

write_csv_arrow()

Write CSV file to disk

C++ reader/writer interface

ParquetFileReader

ParquetFileReader class

ParquetArrowReaderProperties

ParquetArrowReaderProperties class

ParquetFileWriter

ParquetFileWriter class

ParquetWriterProperties

ParquetWriterProperties class

FeatherReader

FeatherReader class

CsvTableReader JsonTableReader

Arrow CSV and JSON table reader classes

RecordBatchReader RecordBatchStreamReader RecordBatchFileReader

RecordBatchReader classes

RecordBatchWriter RecordBatchStreamWriter RecordBatchFileWriter

RecordBatchWriter classes

CsvReadOptions CsvWriteOptions CsvParseOptions TimestampParser CsvConvertOptions JsonReadOptions JsonParseOptions

File reader options

as_record_batch_reader as_record_batch_reader.RecordBatchReader as_record_batch_reader.Table as_record_batch_reader.RecordBatch as_record_batch_reader.data.frame as_record_batch_reader.Dataset as_record_batch_reader.function as_record_batch_reader.arrow_dplyr_query as_record_batch_reader.Scanner

Convert an object to an Arrow RecordBatchReader

Arrow data containers

array Array DictionaryArray StructArray ListArray LargeListArray FixedSizeListArray MapArray StructScalar

Arrow Arrays

chunked_array()

ChunkedArray class

Scalar

Arrow scalars

record_batch()

RecordBatch class

arrow_table()

Table class

ArrayData

ArrayData class

buffer()

Buffer class

read_message()

Read a Message from a stream

concat_arrays() c(<Array>)

Concatenate zero or more Arrays

concat_tables()

Concatenate one or more Tables

ExtensionArray

class arrow::ExtensionArray

vctrs_extension_array() vctrs_extension_type()

Extension type for generic typed vectors

as_arrow_array()

Convert an object to an Arrow Array

as_chunked_array()

Convert an object to an Arrow ChunkedArray

as_record_batch()

Convert an object to an Arrow RecordBatch

as_arrow_table()

Convert an object to an Arrow Table

Arrow data types and schema

schema()

Schema class

unify_schemas()

Combine and harmonize schemas

infer_type() type()

Infer the arrow Array type from an R object

dictionary()

Create a dictionary type

field()

Field class

read_schema()

read a Schema from a stream

int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() duration() null() timestamp() decimal() decimal128() decimal256() struct() list_of() large_list_of() fixed_size_list_of() map_of()

Apache Arrow data types

DataType

class arrow::DataType

DictionaryType

class DictionaryType

FixedWidthType

class arrow::FixedWidthType

new_extension_type() new_extension_array() register_extension_type() reregister_extension_type() unregister_extension_type()

Extension types

vctrs_extension_array() vctrs_extension_type()

Extension type for generic typed vectors

ExtensionType

class arrow::ExtensionType

as_data_type()

Convert an object to an Arrow DataType

as_schema()

Convert an object to an Arrow DataType

Flight

load_flight_server()

Load a Python Flight server

flight_connect()

Connect to a Flight server

flight_disconnect()

Explicitly close a Flight client

flight_get()

Get data from a Flight server

flight_put()

Send data to a Flight server

list_flights() flight_path_exists()

See available resources on a Flight server

File systems

s3_bucket()

Connect to an AWS S3 bucket

gs_bucket()

Connect to a Google Cloud Storage (GCS) bucket

FileSystem LocalFileSystem S3FileSystem GcsFileSystem SubTreeFileSystem

FileSystem classes

FileInfo

FileSystem entry info

FileSelector

file selector

copy_files()

Copy files between FileSystems

Input/Output

InputStream RandomAccessFile MemoryMappedFile ReadableFile BufferReader

InputStream classes

mmap_open()

Open a memory mapped file

mmap_create()

Create a new read/write memory mapped file of a given size

OutputStream FileOutputStream BufferOutputStream

OutputStream classes

Message

class arrow::Message

MessageReader

class arrow::MessageReader

compression CompressedOutputStream CompressedInputStream

Compressed stream classes

Codec

Compression Codec class

codec_is_available()

Check whether a compression codec is available

Computation

call_function()

Call an Arrow compute function

match_arrow() is_in()

match and %in% for Arrow objects

value_counts()

table for Arrow objects

list_compute_functions()

List available Arrow C++ compute functions

register_scalar_function()

Register user-defined functions

show_exec_plan()

Show the details of an Arrow Execution Plan

Connections to other systems

to_arrow()

Create an Arrow object from others

to_duckdb()

Create a (virtual) DuckDB table from an Arrow object

Configuration

arrow_info() arrow_available() arrow_with_dataset() arrow_with_substrait() arrow_with_parquet() arrow_with_s3() arrow_with_gcs() arrow_with_json()

Report information on the package's capabilities

cpu_count() set_cpu_count()

Manage the global CPU thread pool in libarrow

io_thread_count() set_io_thread_count()

Manage the global I/O thread pool in libarrow

install_arrow()

Install or upgrade the Arrow library

install_pyarrow()

Install pyarrow for use with reticulate

create_package_with_all_dependencies()

Create a source bundle that includes all thirdparty dependencies