Starting a Ballista cluster using Docker

Build Docker image

There is no officially published Docker image so it is currently necessary to build the image from source instead.

Run the following commands to clone the source repository and build the Docker image.

git clone git@github.com:apache/arrow-datafusion.git -b 5.1.0
cd arrow-datafusion
./dev/build-ballista-docker.sh

This will create an image with the tag ballista:0.6.0.

Start a Scheduler

Start a scheduler using the following syntax:

docker run --network=host \
  -d ballista:0.6.0 \
  /scheduler --bind-port 50050

Run docker ps to check that the process is running:

$ docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS        PORTS     NAMES
1f3f8b5ed93a   ballista:0.6.0   "/scheduler --bind-p…"   2 seconds ago   Up 1 second             tender_archimedes

Run docker logs CONTAINER_ID to check the output from the process:

$ docker logs 1f3f8b5ed93a
[2021-08-28T15:45:11Z INFO  ballista_scheduler] Ballista v0.6.0 Scheduler listening on 0.0.0.0:50050

Start executors

Start one or more executor processes. Each executor process will need to listen on a different port.

docker run --network=host \
  -d ballista:0.6.0 \
  /executor --external-host localhost --bind-port 50051

Use docker ps to check that both the scheduer and executor(s) are now running:

$ docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED          STATUS          PORTS     NAMES
7c6941bb8dc0   ballista:0.6.0   "/executor --externa…"   3 seconds ago    Up 2 seconds              tender_goldberg
1f3f8b5ed93a   ballista:0.6.0   "/scheduler --bind-p…"   50 seconds ago   Up 49 seconds             tender_archimedes

Use docker logs CONTAINER_ID to check the output from the executor(s):

$ docker logs 7c6941bb8dc0
[2021-08-28T15:45:58Z INFO  ballista_executor] Running with config:
[2021-08-28T15:45:58Z INFO  ballista_executor] work_dir: /tmp/.tmpeyEM76
[2021-08-28T15:45:58Z INFO  ballista_executor] concurrent_tasks: 4
[2021-08-28T15:45:58Z INFO  ballista_executor] Ballista v0.6.0 Rust Executor listening on 0.0.0.0:50051

Using etcd as backing store

NOTE: This functionality is currently experimental

Ballista can optionally use etcd as a backing store for the scheduler. Use the following commands to launch the scheduler with this option enabled.

docker run --network=host \
  -d ballista:0.6.0 \
  /scheduler --bind-port 50050 \
  --config-backend etcd \
  --etcd-urls etcd:2379

Please refer to the etcd web site for installation instructions. Etcd version 3.4.9 or later is recommended.