Apache Arrow#

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.

The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:

  • Zero-copy shared memory and RPC-based data movement

  • Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)

  • In-memory analytics and query processing

To learn how to use Arrow refer to the documentation specific to your target environment.

Specifications

Read about the Apache Arrow format specifications and Protocols.

Development

Find the documentation on the topic of contributions, reviews, building of the libraries from source, building of the documentation, continuous integration, benchmarks and the release process.

Implementations#

Examples#