Powering Columnar In-Memory Analytics
Apache Arrow™ enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.
Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.
Arrow is still early in development.
The first release of Apache Arrow is out. Apache Arrow 0.1.0 is an early release and the APIs are still evolving. The metadata and physical data representation should be fairly stable as we have spent time finalizing the details.
|Name||Alias (email is <alias>@apache.org)|
|P. Taylor Goetz||ptgoetz|
|Julien Le Dem||julien|