Building Arrow Java#
System Setup#
Arrow Java uses the Maven build system.
Building requires:
JDK 8+
Maven 3+
Note
CI will test all supported JDK LTS versions, plus the latest non-LTS version.
Building#
All the instructions below assume that you have cloned the Arrow git repository:
$ git clone https://github.com/apache/arrow.git
$ cd arrow
$ git submodule update --init --recursive
These are the options available to compile Arrow Java modules with:
Maven build tool.
Docker Compose.
Archery.
Building Java Modules#
To build the default modules, go to the project root and execute:
Maven#
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn clean install
Docker compose#
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ docker-compose run java
Archery#
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ archery docker run java
Building JNI Libraries (*.dylib / *.so / *.dll)#
First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).
Note
If you are building on Apple Silicon, be sure to use a JDK version that was compiled for that architecture. See, for example, the Azul JDK.
If you are building on Windows OS, see Developing on Windows.
Maven#
To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow/java $ export JAVA_HOME=<absolute path to your java home> $ java --version $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ ls -latr ../java-dist/lib |__ arrow_cdata_jni/
To build only the JNI C Data Interface library (Windows):
$ cd arrow/java $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ dir "../java-dist/bin" |__ arrow_cdata_jni/
To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow/java $ export JAVA_HOME=<absolute path to your java home> $ java --version $ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N $ ls -latr java-dist/lib |__ arrow_dataset_jni/ |__ arrow_orc_jni/ |__ gandiva_jni/
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow/java $ mvn generate-resources -Pgenerate-libs-jni-windows -N $ dir "../java-dist/bin" |__ arrow_dataset_jni/
CMake#
To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow $ mkdir -p java-dist java-cdata $ cmake \ -S java \ -B java-cdata \ -DARROW_JAVA_JNI_ENABLE_C=ON \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist $ cmake --build java-cdata --target install --config Release $ ls -latr java-dist/lib |__ arrow_cdata_jni/
To build only the JNI C Data Interface library (Windows):
$ cd arrow $ mkdir java-dist, java-cdata $ cmake ^ -S java ^ -B java-cdata ^ -DARROW_JAVA_JNI_ENABLE_C=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist $ cmake --build java-cdata --target install --config Release $ dir "java-dist/bin" |__ arrow_cdata_jni/
To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow $ brew bundle --file=cpp/Brewfile # Homebrew Bundle complete! 25 Brewfile dependencies now installed. $ brew uninstall aws-sdk-cpp # (We can't use aws-sdk-cpp installed by Homebrew because it has # an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 ) $ export JAVA_HOME=<absolute path to your java home> $ mkdir -p java-dist cpp-jni $ cmake \ -S cpp \ -B cpp-jni \ -DARROW_BUILD_SHARED=OFF \ -DARROW_CSV=ON \ -DARROW_DATASET=ON \ -DARROW_DEPENDENCY_SOURCE=BUNDLED \ -DARROW_DEPENDENCY_USE_SHARED=OFF \ -DARROW_FILESYSTEM=ON \ -DARROW_GANDIVA=ON \ -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \ -DARROW_JSON=ON \ -DARROW_ORC=ON \ -DARROW_PARQUET=ON \ -DARROW_S3=ON \ -DARROW_SUBSTRAIT=ON \ -DARROW_USE_CCACHE=ON \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_UNITY_BUILD=ON $ cmake --build cpp-jni --target install --config Release $ cmake \ -S java \ -B java-jni \ -DARROW_JAVA_JNI_ENABLE_C=OFF \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_PREFIX_PATH=$PWD/java-dist \ -DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \ -DProtobuf_USE_STATIC_LIBS=ON $ cmake --build java-jni --target install --config Release $ ls -latr java-dist/lib/ |__ arrow_dataset_jni/ |__ arrow_orc_jni/ |__ gandiva_jni/
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow $ mkdir java-dist, cpp-jni $ cmake ^ -S cpp ^ -B cpp-jni ^ -DARROW_BUILD_SHARED=OFF ^ -DARROW_CSV=ON ^ -DARROW_DATASET=ON ^ -DARROW_DEPENDENCY_USE_SHARED=OFF ^ -DARROW_FILESYSTEM=ON ^ -DARROW_GANDIVA=OFF ^ -DARROW_JSON=ON ^ -DARROW_ORC=ON ^ -DARROW_PARQUET=ON ^ -DARROW_S3=ON ^ -DARROW_SUBSTRAIT=ON ^ -DARROW_USE_CCACHE=ON ^ -DARROW_WITH_BROTLI=ON ^ -DARROW_WITH_LZ4=ON ^ -DARROW_WITH_SNAPPY=ON ^ -DARROW_WITH_ZLIB=ON ^ -DARROW_WITH_ZSTD=ON ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_UNITY_BUILD=ON ^ -GNinja $ cd cpp-jni $ ninja install $ cd ../ $ cmake ^ -S java ^ -B java-jni ^ -DARROW_JAVA_JNI_ENABLE_C=OFF ^ -DARROW_JAVA_JNI_ENABLE_DATASET=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON ^ -DARROW_JAVA_JNI_ENABLE_GANDIVA=OFF ^ -DARROW_JAVA_JNI_ENABLE_ORC=ON ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_PREFIX_PATH=$PWD/java-dist $ cmake --build java-jni --target install --config Release $ dir "java-dist/bin" |__ arrow_orc_jni/ |__ arrow_dataset_jni/
Archery#
$ cd arrow
$ archery docker run java-jni-manylinux-2014
$ ls -latr java-dist
|__ arrow_cdata_jni/
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/
Building Java JNI Modules#
To compile the JNI bindings, use the
arrow-c-data
Maven profile:$ cd arrow/java $ mvn -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib -Parrow-c-data clean install
To compile the JNI bindings for ORC / Gandiva / Dataset, use the
arrow-jni
Maven profile:$ cd arrow/java $ mvn \ -Darrow.cpp.build.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Parrow-jni clean install
IDE Configuration#
IntelliJ#
To start working on Arrow in IntelliJ: build the project once from the command
line using mvn clean install
. Then open the java/
subdirectory of the
Arrow repository, and update the following settings:
In the Files tool window, find the path
vector/target/generated-sources
, right click the directory, and select Mark Directory as > Generated Sources Root. There is no need to mark other generated sources directories, as only thevector
module generates sources.For JDK 8, disable the
error-prone
profile to build the project successfully.For JDK 11, due to an IntelliJ bug, you must go into Settings > Build, Execution, Deployment > Compiler > Java Compiler and disable βUse ββreleaseβ option for cross-compilation (Java 9 and later)β. Otherwise you will get an error like βpackage sun.misc does not existβ.
You may want to disable error-prone entirely if it gives spurious warnings (disable both error-prone profiles in the Maven tool window and βReload All Maven Projectsβ).
If using IntelliJβs Maven integration to build, you may need to change
<fork>
tofalse
in the pom.xml files due to an IntelliJ bug.To enable debugging JNI-based modules like
dataset
, activate specific profiles in the Maven tab under βProfilesβ. Ensure the profilesarrow-c-data
,arrow-jni
,generate-libs-cdata-all-os
,generate-libs-jni-macos-linux
, andjdk11+
are enabled, so that the IDE can build them and enable debugging.
You may not need to update all of these settings if you build/test with the IntelliJ Maven integration instead of with IntelliJ directly.
Common Errors#
When working with the JNI code: if the C++ build cannot find dependencies, with errors like these:
Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem) Could NOT find Lz4 (missing: LZ4_LIB) Could NOT find zstd (missing: ZSTD_LIB)
Specify that the dependencies should be downloaded at build time (more details at Dependency Resolution):
-Dre2_SOURCE=BUNDLED \ -DBoost_SOURCE=BUNDLED \ -Dutf8proc_SOURCE=BUNDLED \ -DSnappy_SOURCE=BUNDLED \ -DORC_SOURCE=BUNDLED \ -DZLIB_SOURCE=BUNDLED
Installing Nightly Packages#
Warning
These packages are not official releases. Use them at your own risk.
Arrow nightly builds are posted on the mailing list at builds@arrow.apache.org. The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found at GitHub Nightly.
Installing from Apache Nightlies#
Look up the nightly version number for the Arrow libraries used.
For example, for
arrow-memory
, visit https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501).Add Apache Nightlies Repository to the Maven/Gradle project.
<properties> <arrow.version>9.0.0.dev501</arrow.version> </properties> ... <repositories> <repository> <id>arrow-apache-nightlies</id> <url>https://nightlies.apache.org/arrow/java</url> </repository> </repositories> ... <dependencies> <dependency> <groupId>org.apache.arrow</groupId> <artifactId>arrow-vector</artifactId> <version>${arrow.version}</version> </dependency> </dependencies> ...
Installing Manually#
Decide nightly packages repository to use, for example: ursacomputing/crossbow
Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty).
<properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> <arrow.version>9.0.0.dev501</arrow.version> </properties> <dependencies> <dependency> <groupId>org.apache.arrow</groupId> <artifactId>flight-core</artifactId> <version>${arrow.version}</version> </dependency> </dependencies>
Download the necessary pom and jar files to a temporary directory:
$ mkdir nightly-packaging-2022-07-30-0-github-java-jars $ cd nightly-packaging-2022-07-30-0-github-java-jars $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar $ tree . βββ arrow-flight-9.0.0.dev501.pom βββ arrow-format-9.0.0.dev501.jar βββ arrow-format-9.0.0.dev501.pom βββ arrow-java-root-9.0.0.dev501.pom βββ arrow-memory-9.0.0.dev501.pom βββ arrow-memory-core-9.0.0.dev501.jar βββ arrow-memory-core-9.0.0.dev501.pom βββ arrow-memory-netty-9.0.0.dev501.jar βββ arrow-memory-netty-9.0.0.dev501.pom βββ arrow-vector-9.0.0.dev501.jar βββ arrow-vector-9.0.0.dev501.pom βββ flight-core-9.0.0.dev501.jar βββ flight-core-9.0.0.dev501.pom
Install the artifacts to the local Maven repository with
mvn install:install-file
:$ mvn install:install-file -Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-java-root -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-flight -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=jar
Validate that the packages were installed:
$ tree ~/.m2/repository/org/apache/arrow . βββ arrow-flight βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-flight-9.0.0.dev501.pom βββ arrow-format βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-format-9.0.0.dev501.jar βΒ Β βΒ Β βββ arrow-format-9.0.0.dev501.pom βββ arrow-java-root βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-java-root-9.0.0.dev501.pom βββ arrow-memory βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-memory-9.0.0.dev501.pom βββ arrow-memory-core βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-memory-core-9.0.0.dev501.jar βΒ Β βΒ Β βββ arrow-memory-core-9.0.0.dev501.pom βββ arrow-memory-netty βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ arrow-memory-netty-9.0.0.dev501.jar βΒ Β βΒ Β βββ arrow-memory-netty-9.0.0.dev501.pom βββ arrow-vector βΒ Β βββ 9.0.0.dev501 βΒ Β βΒ Β βββ _remote.repositories βΒ Β βΒ Β βββ arrow-vector-9.0.0.dev501.jar βΒ Β βΒ Β βββ arrow-vector-9.0.0.dev501.pom βββ flight-core βββ 9.0.0.dev501 βΒ Β βββ flight-core-9.0.0.dev501.jar βΒ Β βββ flight-core-9.0.0.dev501.pom
Compile your project like usual with
mvn clean install
.