Apache Arrow™   Upcoming Events

Apache Arrow

Powering Columnar In-Memory Analytics

Join Mailing List

apache foundation license donate thanks security

Fast

Apache Arrow™ enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.

Flexible

Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

Standard

Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.

Developer Mailing List

Developer Resources

Arrow is still early in development.

Source Code (http) (git)

Issue Tracker (JIRA)

Chat Room (Slack)

Latest release

The first release of Apache Arrow is out. Apache Arrow 0.1.0 is an early release and the APIs are still evolving. The metadata and physical data representation should be fairly stable as we have spent time finalizing the details.

source release

tag apache-arrow-0.1.0

java artifacts on maven central

Performance Advantage of Columnar In-Memory

SIMD

Advantages of a Common Data Layer

common data layer
  • Each system has its own internal memory format
  • 70-80% CPU wasted on serialization and deserialization
  • Similar functionality implemented in multiple projects
common data layer
  • All systems utilize the same memory format
  • No overhead for cross-system communication
  • Projects can share functionality (eg, Parquet-to-Arrow reader)

Committers

Name Alias (email is <alias>@apache.org)
Jacques Nadeau jacques
Todd Lipcon todd
Ted Dunning tdunning
Michael Stack stack
P. Taylor Goetz ptgoetz
Julian Hyde jhyde
Reynold Xin rxin
James Taylor jamestaylor
Julien Le Dem julien
Jake Luciani jake
Jason Altekruse json
Alex Levenson alexlevenson
Parth Chandra parthc
Marcel Kornacker marcel
Steven Phillips smp
Hanifi Gunes hg
Abdelhakim Deneche adeneche
Wes McKinney wesm
David Alves dralves
Ippokratis Pandis ippokratis