API Reference

Type and Schema Factory Functions

null() Create instance of null type
bool_() Create instance of boolean type
int8() Create instance of signed int8 type
int16() Create instance of signed int16 type
int32() Create instance of signed int32 type
int64() Create instance of signed int64 type
uint8() Create instance of boolean type
uint16() Create instance of unsigned uint16 type
uint32() Create instance of unsigned uint32 type
uint64() Create instance of unsigned uint64 type
float16() Create half-precision floating point type
float32() Create single-precision floating point type
float64() Create double-precision floating point type
time32(unit) Create instance of 32-bit time (time of day) type with unit resolution
time64(unit) Create instance of 64-bit time (time of day) type with unit resolution
timestamp(unit[, tz]) Create instance of timestamp type with resolution and optional time zone
date32() Create instance of 32-bit date (days since UNIX epoch 1970-01-01)
date64() Create instance of 64-bit date (milliseconds since UNIX epoch 1970-01-01)
binary(int length=-1) Create variable-length binary type
string() Create UTF8 variable-length string type
decimal((int precision, int scale=0) -> DataType) Create decimal type with precision and scale
list_((value_type) -> ListType) Create ListType instance from child data type or field
struct(fields) Create StructType instance from fields
dictionary((DataType index_type, …) Dictionary (categorical, or simply encoded) type
field(name, type, bool nullable=True, …) Create a pyarrow.Field instance
schema(fields, dict metadata=None) Construct pyarrow.Schema from collection of fields
from_numpy_dtype(dtype) Convert NumPy dtype to pyarrow.DataType

Type checking functions

is_boolean(t) Return True if value is an instance of a boolean type
is_integer(t) Return True if value is an instance of an integer type
is_signed_integer(t) Return True if value is an instance of a signed integer type
is_unsigned_integer(t) Return True if value is an instance of an unsigned integer type
is_floating(t) Return True if value is an instance of a floating point numeric type
is_decimal(t) Return True if value is an instance of a decimal type
is_list(t) Return True if value is an instance of a list type
is_struct(t) Return True if value is an instance of a struct type
is_union(t) Return True if value is an instance of a union type
is_nested(t) Return True if value is an instance of a nested type
is_temporal(t) Return True if value is an instance of a temporal (date, time, timestamp)
is_timestamp(t) Return True if value is an instance of a timestamp type
is_date(t) Return True if value is an instance of a date type
is_time(t) Return True if value is an instance of a time type
is_null(t) Return True if value is an instance of a null type
is_binary(t) Return True if value is an instance of a variable-length binary type
is_unicode(t) Alias for is_string
is_string(t) Return True if value is an instance of string (utf8 unicode) type
is_fixed_size_binary(t) Return True if value is an instance of a fixed size binary type
is_map(t) Return True if value is an instance of a map logical type
is_dictionary(t) Return True if value is an instance of a dictionary-encoded type

Tables and Record Batches

column(field_or_name, arr) Create Column object from field/string and array-like data
chunked_array(arrays[, type]) Construct chunked array from list of array-like objects
ChunkedArray Array backed via one or more memory chunks.
Column Named vector of elements of equal type.
RecordBatch Batch of rows of columns of equal length
Table A collection of top-level named, equal length Arrow arrays.

Tensor type and Functions

Tensor

Input / Output and Shared Memory

allocate_buffer(int64_t size, …) Allocate mutable fixed-size buffer
Buffer
BufferReader Zero-copy reader from objects convertible to Arrow buffer
BufferOutputStream
NativeFile
MemoryMappedFile Supports ‘r’, ‘r+w’, ‘w’ modes
memory_map(path[, mode]) Open memory map at file path.
create_memory_map(path, size) Create memory map at indicated path of the given size, return open
PythonFile

File Systems

hdfs.connect([host, port, user, …]) Connect to an HDFS cluster.
LocalFileSystem
class pyarrow.HadoopFileSystem[source]

Serialization and IPC

Message Container for an Arrow IPC message with metadata and optional body
MessageReader Interface for reading Message objects from some source (like an
RecordBatchFileReader(source[, footer_offset]) Class for reading Arrow record batch data from the Arrow binary file format
RecordBatchFileWriter(sink, schema) Writer to create the Arrow binary file format
RecordBatchStreamReader(source) Reader for the Arrow streaming binary format
RecordBatchStreamWriter(sink, schema) Writer for the Arrow streaming binary format
open_file(source[, footer_offset]) Create reader for Arrow file format
open_stream(source) Create reader for Arrow streaming format
read_message(source) Read length-prefixed message from file or buffer-like object
read_record_batch(obj, Schema schema) Read RecordBatch from message, given a known schema
get_record_batch_size(RecordBatch batch) Return total size of serialized RecordBatch including metadata and padding
read_tensor(NativeFile source) Read pyarrow.Tensor from pyarrow.NativeFile object from current position.
write_tensor(Tensor tensor, NativeFile dest) Write pyarrow.Tensor to pyarrow.NativeFile object its current position
get_tensor_size(Tensor tensor) Return total size of serialized Tensor including metadata and padding
serialize(value, …) EXPERIMENTAL: Serialize a Python sequence
serialize_to(value, sink, …) EXPERIMENTAL: Serialize a Python sequence to a file.
deserialize(obj, …) EXPERIMENTAL: Deserialize Python object from Buffer or other Python object
deserialize_from(source, base, …) EXPERIMENTAL: Deserialize a Python sequence from a file.
read_serialized(source[, base]) EXPERIMENTAL: Read serialized Python sequence from file-like object
SerializedPyObject Arrow-serialized representation of Python object
SerializationContext()

Feather Format

read_feather(source[, columns, nthreads]) Read a pandas.DataFrame from Feather format
write_feather(df, dest) Write a pandas.DataFrame to Feather format

Memory Pools

MemoryPool
default_memory_pool()
total_allocated_bytes()
set_memory_pool(MemoryPool pool)
log_memory_allocations([enable]) Enable or disable memory allocator logging for debugging purposes

Type Classes

DataType Base type for Apache Arrow data type instances.
Field Represents a named field, with a data type, nullability, and optional
Schema

In-Memory Object Store

ObjectID An ObjectID represents a string of bytes used to identify Plasma objects.
PlasmaClient The PlasmaClient is used to interface with a plasma store and manager.
PlasmaBuffer This is the type returned by calls to get with a PlasmaClient.

Apache Parquet

ParquetDataset(path_or_paths[, filesystem, …]) Encapsulates details of reading a complete Parquet dataset possibly
ParquetFile(source[, metadata, common_metadata]) Reader interface for a single Parquet file
ParquetWriter(where, schema[, flavor, …]) Class for incrementally building a Parquet file for Arrow tables
read_table(source[, columns, nthreads, …]) Read a Table from Parquet format
read_metadata(where) Read FileMetadata from footer of a single Parquet file
read_pandas(source[, columns, nthreads, …]) Read a Table from Parquet format, also reading DataFrame index values if
read_schema(where) Read effective Arrow schema from Parquet file metadata
write_metadata(schema, where[, version, …]) Write metadata-only Parquet file from schema
write_table(table, where[, row_group_size, …]) Write a Table to Parquet format