Building Arrow Java#
Contents
System Setup#
Arrow Java uses the Maven build system.
Building requires:
JDK 8, 9, 10, 11, 17, or 18, but only JDK 11 is tested in CI.
Maven 3+
Building#
All the instructions below assume that you have cloned the Arrow git repository:
$ git clone https://github.com/apache/arrow.git
$ cd arrow
$ git submodule update --init --recursive
Basic Installation#
To build the default modules, go to the project root and execute:
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn clean install
Building JNI Libraries on Linux#
First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).
$ cd arrow
$ archery docker run java-jni-manylinux-2014
$ ls -latr java-dist/
|__ libarrow_cdata_jni.so
|__ libarrow_dataset_jni.so
|__ libarrow_orc_jni.so
|__ libgandiva_jni.so
Building JNI Libraries on MacOS#
To build only the C Data Interface library:
$ cd arrow
$ brew bundle --file=cpp/Brewfile
Homebrew Bundle complete! 25 Brewfile dependencies now installed.
$ export JAVA_HOME=<absolute path to your java home>
$ mkdir -p java-dist java-native-c
$ cd java-native-c
$ cmake \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=../java-dist \
../java/c
$ cmake --build . --target install
$ ls -latr ../java-dist/lib
|__ libarrow_cdata_jni.dylib
To build other JNI libraries:
$ cd arrow
$ brew bundle --file=cpp/Brewfile
Homebrew Bundle complete! 25 Brewfile dependencies now installed.
$ export JAVA_HOME=<absolute path to your java home>
$ mkdir -p java-dist java-native-cpp
$ cd java-native-cpp
$ cmake \
-DARROW_BOOST_USE_SHARED=OFF \
-DARROW_BROTLI_USE_SHARED=OFF \
-DARROW_BZ2_USE_SHARED=OFF \
-DARROW_GFLAGS_USE_SHARED=OFF \
-DARROW_GRPC_USE_SHARED=OFF \
-DARROW_LZ4_USE_SHARED=OFF \
-DARROW_OPENSSL_USE_SHARED=OFF \
-DARROW_PROTOBUF_USE_SHARED=OFF \
-DARROW_SNAPPY_USE_SHARED=OFF \
-DARROW_THRIFT_USE_SHARED=OFF \
-DARROW_UTF8PROC_USE_SHARED=OFF \
-DARROW_ZSTD_USE_SHARED=OFF \
-DARROW_JNI=ON \
-DARROW_PARQUET=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_DATASET=ON \
-DARROW_GANDIVA_JAVA=ON \
-DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \
-DARROW_GANDIVA=ON \
-DARROW_ORC=ON \
-DARROW_PLASMA_JAVA_CLIENT=ON \
-DARROW_PLASMA=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=../java-dist \
-DCMAKE_UNITY_BUILD=ON \
-Dre2_SOURCE=BUNDLED \
-DBoost_SOURCE=BUNDLED \
-Dutf8proc_SOURCE=BUNDLED \
-DSnappy_SOURCE=BUNDLED \
-DORC_SOURCE=BUNDLED \
-DZLIB_SOURCE=BUNDLED \
../cpp
$ cmake --build . --target install
$ ls -latr ../java-dist/lib
|__ libarrow_dataset_jni.dylib
|__ libarrow_orc_jni.dylib
|__ libgandiva_jni.dylib
Building Arrow JNI Modules#
To compile the JNI bindings, use the arrow-c-data
Maven profile:
$ cd arrow/java
$ mvn -Darrow.c.jni.dist.dir=../java-dist/lib -Parrow-c-data clean install
To compile the JNI bindings for ORC / Gandiva / Dataset, use the arrow-jni
Maven profile:
$ cd arrow/java
$ mvn -Darrow.cpp.build.dir=../java-dist/lib -Parrow-jni clean install
IDE Configuration#
IntelliJ#
To start working on Arrow in IntelliJ, just open the java/ subdirectory of the Arrow repository.
For JDK 8, disable the
error-prone
profile to build the project successfully.For JDK 11, the project should build successfully with the default profiles.
Common Errors#
- If the build cannot find dependencies, with errors like these:
Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)
Could NOT find Lz4 (missing: LZ4_LIB)
Could NOT find zstd (missing: ZSTD_LIB)
Download the dependencies at build time (More details in the Dependency Resolution):
-Dre2_SOURCE=BUNDLED \ -DBoost_SOURCE=BUNDLED \ -Dutf8proc_SOURCE=BUNDLED \ -DSnappy_SOURCE=BUNDLED \ -DORC_SOURCE=BUNDLED \ -DZLIB_SOURCE=BUNDLED