API Reference

Type and Schema Factory Functions

null()
bool_()
int8()
int16()
int32()
int64()
uint8()
uint16()
uint32()
uint64()
float16()
float32()
float64()
time32(unit_str)
time64(unit_str)
timestamp(unit_str[, tz])
date32()
date64()
binary(int length=-1) Binary (PyBytes-like) type
string() UTF8 string
decimal((int precision, int scale=0) -> DataType)
list_(DataType value_type)
struct(fields)
dictionary(DataType index_type, Array dictionary) Dictionary (categorical, or simply encoded) type
field(name, DataType type, ...) Create a pyarrow.Field instance
schema(fields) Construct pyarrow.Schema from collection of fields
from_numpy_dtype(dtype) Convert NumPy dtype to pyarrow.DataType

Tables and Record Batches

ChunkedArray Array backed via one or more memory chunks.
Column Named vector of elements of equal type.
RecordBatch Batch of rows of columns of equal length
Table A collection of top-level named, equal length Arrow arrays.
get_record_batch_size(RecordBatch batch) Return total size of serialized RecordBatch including metadata and padding

Tensor type and Functions

Tensor
write_tensor(Tensor tensor, NativeFile dest) Write pyarrow.Tensor to pyarrow.NativeFile object its current position
get_tensor_size(Tensor tensor) Return total size of serialized Tensor including metadata and padding
read_tensor(NativeFile source) Read pyarrow.Tensor from pyarrow.NativeFile object from current position.

Input / Output and Shared Memory

Buffer
BufferReader Zero-copy reader from objects convertible to Arrow buffer
BufferOutputStream
NativeFile
MemoryMappedFile Supports ‘r’, ‘r+w’, ‘w’ modes
memory_map(path[, mode]) Open memory map at file path.
create_memory_map(path, size) Create memory map at indicated path of the given size, return open
PythonFile

Interprocess Communication and Messaging

RecordBatchFileReader(source[, footer_offset]) Class for reading Arrow record batch data from the Arrow binary file format
RecordBatchFileWriter(sink, schema) Writer to create the Arrow binary file format
RecordBatchStreamReader(source) Reader for the Arrow streaming binary format
RecordBatchStreamWriter(sink, schema) Writer for the Arrow streaming binary format
open_file(source[, footer_offset]) Create reader for Arrow file format
open_stream(source) Create reader for Arrow streaming format

Memory Pools

MemoryPool
default_memory_pool()
jemalloc_memory_pool() Returns a jemalloc-based memory allocator, which can be passed to
total_allocated_bytes()
set_memory_pool(MemoryPool pool)

Type Classes

DataType
DecimalType
DictionaryType
FixedSizeBinaryType
Time32Type
Time64Type
TimestampType
Field Represents a named field, with a data type, nullability, and optional
Schema

Apache Parquet

ParquetDataset(path_or_paths[, filesystem, ...]) Encapsulates details of reading a complete Parquet dataset possibly
ParquetFile(source[, metadata]) Reader interface for a single Parquet file
read_table(source[, columns, nthreads, metadata]) Read a Table from Parquet format
write_metadata(schema, where[, version]) Write metadata-only Parquet file from schema
write_table(table, where[, row_group_size, ...]) Write a Table to Parquet format