Apache Arrow¶
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.
The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:
Zero-copy shared memory and RPC-based data movement
Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)
In-memory analytics and query processing
To learn how to use Arrow refer to the documentation specific to your target environment.
- Format Versioning and Stability
- Arrow Columnar Format
- Canonical Extension Types
- Arrow Flight RPC
- Arrow Flight SQL
- Integration Testing
- The Arrow C data interface
- The Arrow C stream interface
- The Arrow C Device data interface
- Device Stream Interface
- ADBC: Arrow Database Connectivity
- Other Data Structures
- Changing the Apache Arrow Format Specification
- Glossary