Multi-file datasets

open_dataset()

Open a multi-file dataset

write_dataset()

Write a dataset

dataset_factory()

Create a DatasetFactory

hive_partition()

Construct Hive partitioning

Dataset

Multi-file datasets

Partitioning

Define Partitioning for a Dataset

Expression

Arrow expressions

Scanner

Scan the contents of a dataset

FileFormat

Dataset file formats

FileWriteOptions

Format-specific write options

map_batches()

Apply a function to a stream of RecordBatches

Reading and writing files

read_feather()

Read a Feather file

read_arrow() read_ipc_stream()

Read Arrow IPC stream format

read_parquet()

Read a Parquet file

read_delim_arrow() read_csv_arrow() read_tsv_arrow()

Read a CSV or other delimited file with Arrow

read_json_arrow()

Read a JSON file

write_feather()

Write data in the Feather format

write_arrow() write_ipc_stream()

Write Arrow IPC stream format

write_to_raw()

Write Arrow data to a raw vector

write_parquet()

Write Parquet file to disk

C++ reader/writer interface

ParquetFileReader

ParquetFileReader class

ParquetReaderProperties

ParquetReaderProperties class

ParquetFileWriter

ParquetFileWriter class

ParquetWriterProperties

ParquetWriterProperties class

FeatherReader

FeatherReader class

CsvTableReader

Arrow CSV and JSON table reader classes

RecordBatchReader

RecordBatchReader classes

RecordBatchWriter

RecordBatchWriter classes

CsvReadOptions

File reader options

Arrow data containers

buffer()

Buffer class

array

Arrow Arrays

ArrayData

ArrayData class

chunked_array()

ChunkedArray class

record_batch()

RecordBatch class

Table

Table class

Scalar

Arrow scalars

read_message()

Read a Message from a stream

Arrow data types and schema

schema()

Schema class

unify_schemas()

Combine and harmonize schemas

type()

infer the arrow Array type from an R vector

dictionary()

Create a dictionary type

field()

Field class

read_schema()

read a Schema from a stream

int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() null() timestamp() decimal() list_of() large_list_of() fixed_size_list_of() struct()

Apache Arrow data types

DataType

class arrow::DataType

DictionaryType

class DictionaryType

FixedWidthType

class arrow::FixedWidthType

cast_options()

Cast options

Flight

load_flight_server()

Load a Python Flight server

flight_connect()

Connect to a Flight server

push_data()

Send data to a Flight server

flight_get()

Get data from a Flight server

File systems

s3_bucket()

Connect to an AWS S3 bucket

FileSystem

FileSystem classes

FileInfo

FileSystem entry info

FileSelector

file selector

copy_files()

Copy files between FileSystems

Input/Output

InputStream

InputStream classes

mmap_open()

Open a memory mapped file

mmap_create()

Create a new read/write memory mapped file of a given size

OutputStream

OutputStream classes

Message

class arrow::Message

MessageReader

class arrow::MessageReader

compression

Compressed stream classes

Codec

Compression Codec class

codec_is_available()

Check whether a compression codec is available

Configuration

cpu_count() set_cpu_count()

Manage the global CPU thread pool in libarrow

Installation helpers

arrow_available() arrow_with_s3()

Is the C++ Arrow library available?

install_arrow()

Install or upgrade the Arrow library

install_pyarrow()

Install pyarrow for use with reticulate