Building Arrow Java#

System Setup#

Arrow Java uses the Maven build system.

Building requires:

  • JDK 8, 9, 10, 11, 17, or 18, but only JDK 11 is tested in CI.

  • Maven 3+

Building#

All the instructions below assume that you have cloned the Arrow git repository:

$ git clone https://github.com/apache/arrow.git
$ cd arrow
$ git submodule update --init --recursive

Basic Installation#

To build the default modules, go to the project root and execute:

$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn clean install

Building JNI Libraries on Linux#

First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).

$ cd arrow
$ archery docker run java-jni-manylinux-2014
$ ls -latr java-dist/
|__ libarrow_cdata_jni.so
|__ libarrow_dataset_jni.so
|__ libarrow_orc_jni.so
|__ libgandiva_jni.so

Building JNI Libraries on MacOS#

To build only the C Data Interface library:

$ cd arrow
$ brew bundle --file=cpp/Brewfile
Homebrew Bundle complete! 25 Brewfile dependencies now installed.
$ export JAVA_HOME=<absolute path to your java home>
$ mkdir -p java-dist java-native-c
$ cd java-native-c
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DCMAKE_INSTALL_PREFIX=../java-dist \
    ../java/c
$ cmake --build . --target install
$ ls -latr ../java-dist/lib
|__ libarrow_cdata_jni.dylib

To build other JNI libraries:

$ cd arrow
$ brew bundle --file=cpp/Brewfile
Homebrew Bundle complete! 25 Brewfile dependencies now installed.
$ export JAVA_HOME=<absolute path to your java home>
$ mkdir -p java-dist java-native-cpp
$ cd java-native-cpp
$ cmake \
    -DARROW_BOOST_USE_SHARED=OFF \
    -DARROW_BROTLI_USE_SHARED=OFF \
    -DARROW_BZ2_USE_SHARED=OFF \
    -DARROW_GFLAGS_USE_SHARED=OFF \
    -DARROW_GRPC_USE_SHARED=OFF \
    -DARROW_LZ4_USE_SHARED=OFF \
    -DARROW_OPENSSL_USE_SHARED=OFF \
    -DARROW_PROTOBUF_USE_SHARED=OFF \
    -DARROW_SNAPPY_USE_SHARED=OFF \
    -DARROW_THRIFT_USE_SHARED=OFF \
    -DARROW_UTF8PROC_USE_SHARED=OFF \
    -DARROW_ZSTD_USE_SHARED=OFF \
    -DARROW_JNI=ON \
    -DARROW_PARQUET=ON \
    -DARROW_FILESYSTEM=ON \
    -DARROW_DATASET=ON \
    -DARROW_GANDIVA_JAVA=ON \
    -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \
    -DARROW_GANDIVA=ON \
    -DARROW_ORC=ON \
    -DARROW_PLASMA_JAVA_CLIENT=ON \
    -DARROW_PLASMA=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DCMAKE_INSTALL_PREFIX=../java-dist \
    -DCMAKE_UNITY_BUILD=ON \
    -Dre2_SOURCE=BUNDLED \
    -DBoost_SOURCE=BUNDLED \
    -Dutf8proc_SOURCE=BUNDLED \
    -DSnappy_SOURCE=BUNDLED \
    -DORC_SOURCE=BUNDLED \
    -DZLIB_SOURCE=BUNDLED \
    ../cpp
$ cmake --build . --target install
$ ls -latr  ../java-dist/lib
|__ libarrow_dataset_jni.dylib
|__ libarrow_orc_jni.dylib
|__ libgandiva_jni.dylib

Building Arrow JNI Modules#

To compile the JNI bindings, use the arrow-c-data Maven profile:

$ cd arrow/java
$ mvn -Darrow.c.jni.dist.dir=../java-dist/lib -Parrow-c-data clean install

To compile the JNI bindings for ORC / Gandiva / Dataset, use the arrow-jni Maven profile:

$ cd arrow/java
$ mvn -Darrow.cpp.build.dir=../java-dist/lib -Parrow-jni clean install

IDE Configuration#

IntelliJ#

To start working on Arrow in IntelliJ, just open the java/ subdirectory of the Arrow repository.

  • For JDK 8, disable the error-prone profile to build the project successfully.

  • For JDK 11, the project should build successfully with the default profiles.

Common Errors#

  1. If the build cannot find dependencies, with errors like these:
    • Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)

    • Could NOT find Lz4 (missing: LZ4_LIB)

    • Could NOT find zstd (missing: ZSTD_LIB)

    Download the dependencies at build time (More details in the Dependency Resolution):

    -Dre2_SOURCE=BUNDLED \
    -DBoost_SOURCE=BUNDLED \
    -Dutf8proc_SOURCE=BUNDLED \
    -DSnappy_SOURCE=BUNDLED \
    -DORC_SOURCE=BUNDLED \
    -DZLIB_SOURCE=BUNDLED