High-Level Overview#

The Apache Arrow Java modules implement various specifications including the columnar format and IPC. Most modules are native Java implementations, but some modules are JNI bindings to the C++ library.

Arrow Java Modules#

Module

Description

Implementation

arrow-format

Generated Java files from the IPC Flatbuffer definitions.

Native

arrow-memory-core

Core off-heap memory management libraries for Arrow ValueVectors.

Native

arrow-memory-unsafe

Memory management implementation based on sun.misc.Unsafe.

Native

arrow-memory-netty

Memory management implementation based on Netty.

Native

arrow-vector

An off-heap reference implementation for Arrow columnar data format.

Native

arrow-tools

Java applications for working with Arrow ValueVectors.

Native

arrow-jdbc

(Experimental) A library for converting JDBC data to Arrow data.

Native

arrow-plasma

(Experimental) Java client for the Plasma object store.

Native

flight-core

(Experimental) An RPC mechanism for transferring ValueVectors.

Native

flight-grpc

(Experimental) Contains utility class to expose Flight gRPC service and client.

Native

flight-sql

(Experimental) Contains utility classes to expose Flight SQL semantics for clients and servers over Arrow Flight.

Native

flight-integration-tests

Integration tests for Flight RPC.

Native

arrow-performance

JMH benchmarks for the Arrow libraries.

Native

arrow-algorithm

(Experimental) A collection of algorithms for working with ValueVectors.

Native

arrow-avro

(Experimental) A library for converting Avro data to Arrow data.

Native

arrow-compression

(Experimental) A library for working with compression/decompression of Arrow data.

Native

arrow-c-data

Java implementation of C Data Interface

JNI

arrow-orc

(Experimental) A JNI wrapper for the C++ ORC reader implementation.

JNI

arrow-gandiva

Java wrappers around the native Gandiva SQL expression compiler.

JNI

arrow-dataset

Java bindings to the Arrow Datasets library.

JNI

Arrow Java modules support working with data (1) in-memory, (2) at rest, and (3) on-the-wire.