Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.

Using Arrow's HDFS (Apache Hadoop Distributed File System) interface

Build requirements

To build the integration, pass the following option to CMake


For convenience, we have bundled hdfs.h for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the hdfs.h in your installed Hadoop distribution, set the $HADOOP_HOME environment variable.

Runtime requirements

By default, the HDFS client C++ class in libarrow_io uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.

``shell export CLASSPATH=$HADOOP_HOME/bin/hadoop classpath –glob`

* `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is
installed somewhere other than `$HADOOP_HOME/lib/native`.
To accommodate distribution-specific nuances, the `JAVA_HOME` variable may be
set to the root path for the Java SDK, the JRE path itself, or to the directory
containing the `libjvm` library.
### Mac Specifics
The installed location of Java on OS X can vary, however the following snippet
will set it automatically for you:
export JAVA_HOME=$(/usr/libexec/java_home)

Homebrew's Hadoop does not have native libs. Apache doesn't build these, so users must build Hadoop to get the native libs. See this Stack Overflow answer for details:


Be sure to include the path to the native libs in JAVA_LIBRARY_PATH:


If you get an error about needing to install Java 6, then add BundledApp and JNI to the JVMCapabilities in $JAVA_HOME/../Info.plist. See