Developing with conda

Linux and macOS

System Requirements

On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is sufficient.

On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or higher. You can check your version by running

$ gcc --version

On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:

$ sudo apt-get install g++-4.9

Finally, set gcc 4.9 as the active compiler using:

export CC=gcc-4.9
export CXX=g++-4.9

Environment Setup and Build

First, let’s create a conda environment with all the C++ build and Python dependencies from conda-forge:

conda create -y -q -n pyarrow-dev \
      python=3.6 numpy six setuptools cython pandas pytest \
      cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
      brotli jemalloc -c conda-forge
source activate pyarrow-dev

Now, let’s clone the Arrow and Parquet git repositories:

mkdir repos
cd repos
git clone
git clone

You should now see

$ ls -l
total 8
drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/

We need to set some environment variables to let Arrow’s build system know about our build toolchain:

export ARROW_BUILD_TYPE=release


Now build and install the Arrow C++ libraries:

mkdir arrow/cpp/build
pushd arrow/cpp/build

      -DARROW_PYTHON=on \
make -j4
make install

Now, optionally build and install the Apache Parquet libraries in your toolchain:

mkdir parquet-cpp/build
pushd parquet-cpp/build


make -j4
make install

Now, build pyarrow:

cd arrow/python
python build_ext --build-type=$ARROW_BUILD_TYPE \
       --with-parquet --with-jemalloc --inplace

If you did not build parquet-cpp, you can omit --with-parquet.

You should be able to run the unit tests with:

$ py.test pyarrow
================================ test session starts ====================
platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /home/wesm/arrow-clone/python, inifile:
collected 198 items

pyarrow/tests/ ...........
pyarrow/tests/ .....................
pyarrow/tests/ .............................
pyarrow/tests/ ..........................
pyarrow/tests/ sssssssssssssss
pyarrow/tests/ ..................
pyarrow/tests/ ........
pyarrow/tests/ ss
pyarrow/tests/ ....................
pyarrow/tests/ ..........
pyarrow/tests/ .........
pyarrow/tests/ .............
pyarrow/tests/ ................

====================== 181 passed, 17 skipped in 0.98 seconds ===========


First, we bootstrap a conda environment similar to the C++ build instructions. This includes all the dependencies for Arrow and the Apache Parquet C++ libraries.

First, starting from fresh clones of Apache Arrow and parquet-cpp:

git clone
git clone
conda create -n arrow-dev cmake git boost-cpp ^
      flatbuffers snappy zlib brotli thrift-cpp rapidjson
activate arrow-dev

As one git housekeeping item, we must run this command in our Arrow clone:

cd arrow
git config core.symlinks true

Now, we build and install Arrow C++ libraries

mkdir cpp\build
cd cpp\build
set ARROW_HOME=C:\thirdparty
cmake -G "Visual Studio 14 2015 Win64" ^
      -DCMAKE_BUILD_TYPE=Release ^
      -DARROW_PYTHON=on ..
cmake --build . --target INSTALL --config Release
cd ..\..

Now, we build parquet-cpp and install the result in the same place:

mkdir ..\parquet-cpp\build
pushd ..\parquet-cpp\build
set PARQUET_HOME=C:\thirdparty
cmake -G "Visual Studio 14 2015 Win64" ^
      -DCMAKE_BUILD_TYPE=Release ^
cmake --build . --target INSTALL --config Release

After that, we must put the install directory’s bin path in our %PATH%:


Now, we can build pyarrow:

cd python
python build_ext --inplace --with-parquet

Then run the unit tests with:

py.test pyarrow -v

Running C++ unit tests with Python

Getting python-test.exe to run is a bit tricky because your %PYTHONPATH% must be configured given the active conda environment:

set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test

Now python-test.exe or simply ctest (to run all tests) should work.