Cross compiling for WebAssembly with Emscripten#

Prerequisites#

You need CMake and compilers etc. installed as per the normal build instructions. Before building with Emscripten, you also need to install Emscripten and activate it using the commands below (see https://emscripten.org/docs/getting_started/downloads.html for details).

git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
# replace <version> with the desired EMSDK version.
# e.g. for Pyodide 0.24, you need EMSDK version 3.1.45
./emsdk install <version>
./emsdk activate <version>
source ./emsdk_env.sh

If you want to build PyArrow for Pyodide, you need pyodide-build installed via pip, and to be running with the same version of Python that Pyodide is built for, along with the same versions of emsdk tools.

# install Pyodide build tools.
# e.g. for version 0.24 of Pyodide:
pip install pyodide-build==0.24

Then build with the ninja-release-emscripten CMake preset, like below:

emcmake cmake --preset "ninja-release-emscripten"
ninja install

This will install a built static library version of libarrow it into the Emscripten sysroot cache, meaning you can build things that depend on it and they will find libarrow.

e.g. if you want to build for Pyodide, run the commands above, and then go to arrow/python and run

pyodide build

It should make a wheel targeting the currently enabled version of Pyodide (i.e. the version corresponding to the currently installed pyodide-build) in the dist subdirectory.

Manual Build#

If you want to manually build for Emscripten, take a look at the CMakePresets.json file in the arrow/cpp directory for a list of things you will need to override. In particular you will need:

  1. Build dependencies set to BUNDLED, so it uses properly cross compiled build dependencies.

  2. CMAKE_TOOLCHAIN_FILE set by using emcmake cmake instead of just cmake.

  3. You will quite likely need to set ARROW_ENABLE_THREADING to OFF for builds targeting single threaded Emscripten environments such as Pyodide.

  4. ARROW_FLIGHT and anything else that uses network probably won’t work.

  5. ARROW_JEMALLOC and ARROW_MIMALLOC again probably need to be OFF

  6. ARROW_BUILD_STATIC set to ON and ARROW_BUILD_SHARED set to OFF is most likely to work.