Starting a Ballista cluster using Docker Compose

Docker Compose is a convenient way to launch a cluster when testing locally.

Build Docker image

There is no officially published Docker image so it is currently necessary to build the image from source instead.

Run the following commands to clone the source repository and build the Docker image.

git clone git@github.com:apache/arrow-datafusion.git -b 5.1.0
cd arrow-datafusion
./dev/build-ballista-docker.sh

This will create an image with the tag ballista:0.6.0.

Start a cluster

The following Docker Compose example demonstrates how to start a cluster using one scheduler process and one executor process, with the scheduler using etcd as a backing store. A data volume is mounted into each container so that Ballista can access the host file system.

version: "2.2"
services:
  etcd:
    image: quay.io/coreos/etcd:v3.4.9
    command: "etcd -advertise-client-urls http://etcd:2379 -listen-client-urls http://0.0.0.0:2379"
  ballista-scheduler:
    image: ballista:0.6.0
    command: "/scheduler --config-backend etcd --etcd-urls etcd:2379 --bind-host 0.0.0.0 --bind-port 50050"
    ports:
      - "50050:50050"
    environment:
      - RUST_LOG=info
    volumes:
      - ./data:/data
    depends_on:
      - etcd
  ballista-executor:
    image: ballista:0.6.0
    command: "/executor --bind-host 0.0.0.0 --bind-port 50051 --scheduler-host ballista-scheduler"
    ports:
      - "50051:50051"
    environment:
      - RUST_LOG=info
    volumes:
      - ./data:/data
    depends_on:
      - ballista-scheduler

With the above content saved to a docker-compose.yaml file, the following command can be used to start the single node cluster.

docker-compose up

This should show output similar to the following:

$ docker-compose up
Creating network "ballista-benchmarks_default" with the default driver
Creating ballista-benchmarks_etcd_1 ... done
Creating ballista-benchmarks_ballista-scheduler_1 ... done
Creating ballista-benchmarks_ballista-executor_1  ... done
Attaching to ballista-benchmarks_etcd_1, ballista-benchmarks_ballista-scheduler_1, ballista-benchmarks_ballista-executor_1
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] Running with config:
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] work_dir: /tmp/.tmpLVx39c
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] concurrent_tasks: 4
ballista-scheduler_1  | [2021-08-28T15:55:22Z INFO  ballista_scheduler] Ballista v0.6.0 Scheduler listening on 0.0.0.0:50050
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] Ballista v0.6.0 Rust Executor listening on 0.0.0.0:50051

The scheduler listens on port 50050 and this is the port that clients will need to connect to.