Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.
HDFS

Using Arrow's HDFS (Apache Hadoop Distributed File System) interface

Build requirements

To build the integration, pass the following option to CMake

1 -DARROW_HDFS=on

For convenience, we have bundled hdfs.h for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the hdfs.h in your installed Hadoop distribution, set the $HADOOP_HOME environment variable.

Runtime requirements

By default, the HDFS client C++ class in libarrow_io uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.

``shell export CLASSPATH=$HADOOP_HOME/bin/hadoop classpath –glob`

1 * `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is
2 installed somewhere other than `$HADOOP_HOME/lib/native`.
3 
4 ### Mac Specifics
5 
6 The installed location of Java on OS X can vary, however the following snippet
7 will set it automatically for you:
8 
9 ```shell
10 export JAVA_HOME=$(/usr/libexec/java_home)

Homebrew's Hadoop does not have native libs. Apache doesn't build these, so users must build Hadoop to get the native libs. See this Stack Overflow answer for details:

http://stackoverflow.com/a/40051353/478288

Be sure to include the path to the native libs in JAVA_LIBRARY_PATH:

1 export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

If you get an error about needing to install Java 6, then add BundledApp and JNI to the JVMCapabilities in $JAVA_HOME/../Info.plist. See

https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/

https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/