Building Arrow JavaΒΆ
System SetupΒΆ
Arrow Java uses the Maven build system.
Building requires:
JDK 8, 9, 10, 11, 17, or 18, but only JDK 8, 11 and 17 are tested in CI.
Maven 3+
BuildingΒΆ
All the instructions below assume that you have cloned the Arrow git repository:
$ git clone https://github.com/apache/arrow.git
$ cd arrow
$ git submodule update --init --recursive
These are the options available to compile Arrow Java modules with:
Maven build tool.
Docker Compose.
Archery.
Building Java ModulesΒΆ
To build the default modules, go to the project root and execute:
MavenΒΆ
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn clean install
Docker composeΒΆ
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ docker-compose run debian-java
ArcheryΒΆ
$ cd arrow/java
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ archery docker run debian-java
Building JNI Libraries (*.dylib / *.so / *.dll)ΒΆ
First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).
Note
If you are building on Apple Silicon, be sure to use a JDK version that was compiled for that architecture. See, for example, the Azul JDK.
If you are building on Windows OS, see Developing on Windows.
MavenΒΆ
To build only the JNI C Data Interface library (MacOS / Linux):
$ cd arrow/java $ export JAVA_HOME=<absolute path to your java home> $ java --version $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ ls -latr ../java-dist/lib/<your system's architecture> |__ libarrow_cdata_jni.dylib |__ libarrow_cdata_jni.so
To build only the JNI C Data Interface library (Windows):
$ cd arrow/java $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ dir "../java-dist/bin/x86_64" |__ arrow_cdata_jni.dll
To build all JNI libraries (MacOS / Linux) except the JNI C Data Interface library:
$ cd arrow/java $ export JAVA_HOME=<absolute path to your java home> $ java --version $ mvn generate-resources \ -Pgenerate-libs-jni-macos-linux \ -DARROW_GANDIVA=ON \ -DARROW_JAVA_JNI_ENABLE_GANDIVA=ON \ -N $ ls -latr java-dist/lib/<your system's architecture>/*_{jni,java}.* |__ libarrow_dataset_jni.dylib |__ libarrow_orc_jni.dylib |__ libgandiva_jni.dylib
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow/java $ mvn generate-resources -Pgenerate-libs-jni-windows -N $ dir "../java-dist/bin/x86_64" |__ arrow_dataset_jni.dll
CMakeΒΆ
To build only the JNI C Data Interface library (MacOS / Linux):
$ cd arrow $ mkdir -p java-dist java-cdata $ cmake \ -S java \ -B java-cdata \ -DARROW_JAVA_JNI_ENABLE_C=ON \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \ -DCMAKE_INSTALL_PREFIX=java-dist $ cmake --build java-cdata --target install --config Release $ ls -latr java-dist/lib |__ libarrow_cdata_jni.dylib |__ libarrow_cdata_jni.so
To build only the JNI C Data Interface library (Windows):
$ cd arrow $ mkdir java-dist, java-cdata $ cmake ^ -S java ^ -B java-cdata ^ -DARROW_JAVA_JNI_ENABLE_C=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^ -DCMAKE_INSTALL_PREFIX=java-dist $ cmake --build java-cdata --target install --config Release $ dir "java-dist/bin" |__ arrow_cdata_jni.dll
To build all JNI libraries (MacOS / Linux) except the JNI C Data Interface library:
$ cd arrow $ brew bundle --file=cpp/Brewfile Homebrew Bundle complete! 25 Brewfile dependencies now installed. $ brew uninstall aws-sdk-cpp (We can't use aws-sdk-cpp installed by Homebrew because it has an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 ) $ export JAVA_HOME=<absolute path to your java home> $ mkdir -p java-dist cpp-jni $ cmake \ -S cpp \ -B cpp-jni \ -DARROW_BUILD_SHARED=OFF \ -DARROW_CSV=ON \ -DARROW_DATASET=ON \ -DARROW_DEPENDENCY_SOURCE=BUNDLED \ -DARROW_DEPENDENCY_USE_SHARED=OFF \ -DARROW_FILESYSTEM=ON \ -DARROW_GANDIVA=ON \ -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \ -DARROW_ORC=ON \ -DARROW_PARQUET=ON \ -DARROW_S3=ON \ -DARROW_USE_CCACHE=ON \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_UNITY_BUILD=ON $ cmake --build cpp-jni --target install --config Release $ cmake \ -S java \ -B java-jni \ -DARROW_JAVA_JNI_ENABLE_C=OFF \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_PREFIX_PATH=$PWD/java-dist $ cmake --build java-jni --target install --config Release $ ls -latr java-dist/lib/<your system's architecture>/*_{jni,java}.* |__ libarrow_dataset_jni.dylib |__ libarrow_orc_jni.dylib |__ libgandiva_jni.dylib
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow $ mkdir java-dist, cpp-jni $ cmake ^ -S cpp ^ -B cpp-jni ^ -DARROW_BUILD_SHARED=OFF ^ -DARROW_CSV=ON ^ -DARROW_DATASET=ON ^ -DARROW_DEPENDENCY_USE_SHARED=OFF ^ -DARROW_FILESYSTEM=ON ^ -DARROW_ORC=OFF ^ -DARROW_PARQUET=ON ^ -DARROW_S3=ON ^ -DARROW_USE_CCACHE=ON ^ -DARROW_WITH_BROTLI=ON ^ -DARROW_WITH_LZ4=ON ^ -DARROW_WITH_SNAPPY=ON ^ -DARROW_WITH_ZLIB=ON ^ -DARROW_WITH_ZSTD=ON ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_UNITY_BUILD=ON ^ -GNinja $ cd cpp-jni $ ninja install $ cd ../ $ cmake ^ -S java ^ -B java-jni ^ -DARROW_JAVA_JNI_ENABLE_C=OFF ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON ^ -DARROW_JAVA_JNI_ENABLE_GANDIVA=OFF ^ -DARROW_JAVA_JNI_ENABLE_ORC=OFF ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_PREFIX_PATH=$PWD/java-dist $ cmake --build java-jni --target install --config Release $ dir "java-dist/bin" |__ arrow_dataset_jni.dll
ArcheryΒΆ
$ cd arrow
$ archery docker run java-jni-manylinux-2014
$ ls -latr java-dist/<your system's architecture>/
|__ libarrow_cdata_jni.so
|__ libarrow_dataset_jni.so
|__ libarrow_orc_jni.so
|__ libgandiva_jni.so
Building Java JNI ModulesΒΆ
To compile the JNI bindings, use the
arrow-c-data
Maven profile:$ cd arrow/java $ mvn -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib -Parrow-c-data clean install
To compile the JNI bindings for ORC / Gandiva / Dataset, use the
arrow-jni
Maven profile:$ cd arrow/java $ mvn \ -Darrow.cpp.build.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Parrow-jni clean install
IDE ConfigurationΒΆ
IntelliJΒΆ
To start working on Arrow in IntelliJ: build the project once from the command
line using mvn clean install
. Then open the java/
subdirectory of the
Arrow repository, and update the following settings:
In the Files tool window, find the path
vector/target/generated-sources
, right click the directory, and select Mark Directory as > Generated Sources Root. There is no need to mark other generated sources directories, as only thevector
module generates sources.For JDK 8, disable the
error-prone
profile to build the project successfully.For JDK 11, due to an IntelliJ bug, you must go into Settings > Build, Execution, Deployment > Compiler > Java Compiler and disable βUse ββreleaseβ option for cross-compilation (Java 9 and later)β. Otherwise you will get an error like βpackage sun.misc does not existβ.
You may want to disable error-prone entirely if it gives spurious warnings (disable both error-prone profiles in the Maven tool window and βReload All Maven Projectsβ).
If using IntelliJβs Maven integration to build, you may need to change
<fork>
tofalse
in the pom.xml files due to an IntelliJ bug.
You may not need to update all of these settings if you build/test with the IntelliJ Maven integration instead of with IntelliJ directly.
Common ErrorsΒΆ
When working with the JNI code: if the C++ build cannot find dependencies, with errors like these:
Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem) Could NOT find Lz4 (missing: LZ4_LIB) Could NOT find zstd (missing: ZSTD_LIB)
Specify that the dependencies should be downloaded at build time (more details at Dependency Resolution):
-Dre2_SOURCE=BUNDLED \ -DBoost_SOURCE=BUNDLED \ -Dutf8proc_SOURCE=BUNDLED \ -DSnappy_SOURCE=BUNDLED \ -DORC_SOURCE=BUNDLED \ -DZLIB_SOURCE=BUNDLED
Installing Nightly PackagesΒΆ
Warning
These packages are not official releases. Use them at your own risk.
Arrow nightly builds are posted on the mailing list at builds@arrow.apache.org. The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found at Github Nightly.
Installing from Apache NightliesΒΆ
Look up the nightly version number for the Arrow libraries used.
For example, for
arrow-memory
, visit https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501).Add Apache Nightlies Repository to the Maven/Gradle project.
<properties>
<arrow.version>9.0.0.dev501</arrow.version>
</properties>
...
<repositories>
<repository>
<id>arrow-apache-nightlies</id>
<url>https://nightlies.apache.org/arrow/java</url>
</repository>
</repositories>
...
<dependencies>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
<version>${arrow.version}</version>
</dependency>
</dependencies>
...
Installing ManuallyΒΆ
Decide nightly packages repository to use, for example: https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars
Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memeory-core and arrow-memory-netty).
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<arrow.version>9.0.0.dev501</arrow.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>flight-core</artifactId>
<version>${arrow.version}</version>
</dependency>
</dependencies>
Download the necessary pom and jar files to a temporary directory:
$ mkdir nightly-packaging-2022-07-30-0-github-java-jars
$ cd nightly-packaging-2022-07-30-0-github-java-jars
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar
$ tree
.
βββ arrow-flight-9.0.0.dev501.pom
βββ arrow-format-9.0.0.dev501.jar
βββ arrow-format-9.0.0.dev501.pom
βββ arrow-java-root-9.0.0.dev501.pom
βββ arrow-memory-9.0.0.dev501.pom
βββ arrow-memory-core-9.0.0.dev501.jar
βββ arrow-memory-core-9.0.0.dev501.pom
βββ arrow-memory-netty-9.0.0.dev501.jar
βββ arrow-memory-netty-9.0.0.dev501.pom
βββ arrow-vector-9.0.0.dev501.jar
βββ arrow-vector-9.0.0.dev501.pom
βββ flight-core-9.0.0.dev501.jar
βββ flight-core-9.0.0.dev501.pom
Install the artifacts to the local Maven repository with
mvn install:install-file
:
$ mvn install:install-file -Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-java-root -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-flight -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=jar
Validate that the packages were installed:
$ tree ~/.m2/repository/org/apache/arrow
.
βββ arrow-flight
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-flight-9.0.0.dev501.pom
βββ arrow-format
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-format-9.0.0.dev501.jar
βΒ Β βΒ Β βββ arrow-format-9.0.0.dev501.pom
βββ arrow-java-root
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-java-root-9.0.0.dev501.pom
βββ arrow-memory
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-memory-9.0.0.dev501.pom
βββ arrow-memory-core
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-memory-core-9.0.0.dev501.jar
βΒ Β βΒ Β βββ arrow-memory-core-9.0.0.dev501.pom
βββ arrow-memory-netty
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ arrow-memory-netty-9.0.0.dev501.jar
βΒ Β βΒ Β βββ arrow-memory-netty-9.0.0.dev501.pom
βββ arrow-vector
βΒ Β βββ 9.0.0.dev501
βΒ Β βΒ Β βββ _remote.repositories
βΒ Β βΒ Β βββ arrow-vector-9.0.0.dev501.jar
βΒ Β βΒ Β βββ arrow-vector-9.0.0.dev501.pom
βββ flight-core
βββ 9.0.0.dev501
βΒ Β βββ flight-core-9.0.0.dev501.jar
βΒ Β βββ flight-core-9.0.0.dev501.pom
Compile your project like usual with
mvn clean install
.