Apache Arrow¶

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.

The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:

Zero-copy shared memory and RPC-based data movement
Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)
In-memory analytics and query processing

To learn how to use Arrow refer to the documentation specific to your target environment.

Supported Environments

C/GLib
C++
C#
Go
Java
JavaScript
Julia
MATLAB
Python
R
Ruby
Rust
Implementation Status

Cookbooks

C++
Java
Python
R

Specifications and Protocols

Format Versioning and Stability
Arrow Columnar Format
Canonical Extension Types
- Introduction
- Official List
Arrow Flight RPC
Arrow Flight SQL
Integration Testing
The Arrow C data interface
The Arrow C stream interface
The Arrow C Device data interface
Device Stream Interface
ADBC: Arrow Database Connectivity
Other Data Structures
- Tensor (Multi-dimensional Array)
- Sparse Tensor
Changing the Apache Arrow Format Specification
Glossary

Development

Contributing to Apache Arrow
- Code of Conduct
- Language specific
Bug reports and feature requests
New Contributor’s Guide
Contributing Overview
Reviewing contributions
C++ Development
Java Development
- Building Arrow Java
- Development Guidelines
Python Development
Continuous Integration
Benchmarks
Building the Documentation
Release Management Guide

C/GLib docs