API Reference

Type and Schema Factory Functions

null() Create instance of null type
bool_() Create instance of boolean type
int8() Create instance of signed int8 type
int16() Create instance of signed int16 type
int32() Create instance of signed int32 type
int64() Create instance of signed int64 type
uint8() Create instance of boolean type
uint16() Create instance of unsigned uint16 type
uint32() Create instance of unsigned uint32 type
uint64() Create instance of unsigned uint64 type
float16() Create half-precision floating point type
float32() Create single-precision floating point type
float64() Create double-precision floating point type
time32(unit) Create instance of 32-bit time (time of day) type with unit resolution
time64(unit) Create instance of 64-bit time (time of day) type with unit resolution
timestamp(unit[, tz]) Create instance of timestamp type with resolution and optional time zone
date32() Create instance of 32-bit date (days since UNIX epoch 1970-01-01)
date64() Create instance of 64-bit date (milliseconds since UNIX epoch 1970-01-01)
binary(int length=-1) Create variable-length binary type
string() Create UTF8 variable-length string type
decimal((int precision, int scale=0) -> DataType) Create decimal type with precision and scale
list_((value_type) -> ListType) Create ListType instance from child data type or field
struct(fields) Create StructType instance from fields
dictionary((DataType index_type, ...) Dictionary (categorical, or simply encoded) type
field(name, DataType type, ...) Create a pyarrow.Field instance
schema(fields) Construct pyarrow.Schema from collection of fields
from_numpy_dtype(dtype) Convert NumPy dtype to pyarrow.DataType

Tables and Record Batches

ChunkedArray Array backed via one or more memory chunks.
Column Named vector of elements of equal type.
RecordBatch Batch of rows of columns of equal length
Table A collection of top-level named, equal length Arrow arrays.

Tensor type and Functions

Tensor

Input / Output and Shared Memory

Buffer
BufferReader Zero-copy reader from objects convertible to Arrow buffer
BufferOutputStream
NativeFile
MemoryMappedFile Supports ‘r’, ‘r+w’, ‘w’ modes
memory_map(path[, mode]) Open memory map at file path.
create_memory_map(path, size) Create memory map at indicated path of the given size, return open
PythonFile

Interprocess Communication and Messaging

Message Container for an Arrow IPC message with metadata and optional body
MessageReader Interface for reading Message objects from some source (like an
RecordBatchFileReader(source[, footer_offset]) Class for reading Arrow record batch data from the Arrow binary file format
RecordBatchFileWriter(sink, schema) Writer to create the Arrow binary file format
RecordBatchStreamReader(source) Reader for the Arrow streaming binary format
RecordBatchStreamWriter(sink, schema) Writer for the Arrow streaming binary format
open_file(source[, footer_offset]) Create reader for Arrow file format
open_stream(source) Create reader for Arrow streaming format
read_message(source) Read length-prefixed message from file or buffer-like object
read_record_batch(Message batch_message, ...) Read RecordBatch from message, given a known schema
get_record_batch_size(RecordBatch batch) Return total size of serialized RecordBatch including metadata and padding
read_tensor(NativeFile source) Read pyarrow.Tensor from pyarrow.NativeFile object from current position.
write_tensor(Tensor tensor, NativeFile dest) Write pyarrow.Tensor to pyarrow.NativeFile object its current position
get_tensor_size(Tensor tensor) Return total size of serialized Tensor including metadata and padding

Type Classes

DataType Base type for Apache Arrow data type instances.
Field Represents a named field, with a data type, nullability, and optional
Schema

Apache Parquet

ParquetDataset(path_or_paths[, filesystem, ...]) Encapsulates details of reading a complete Parquet dataset possibly
ParquetFile(source[, metadata]) Reader interface for a single Parquet file
read_table(source[, columns, nthreads, metadata]) Read a Table from Parquet format
write_metadata(schema, where[, version]) Write metadata-only Parquet file from schema
write_table(table, where[, row_group_size, ...]) Write a Table to Parquet format