Starting a Ballista Cluster using Docker¶
Build Docker Images¶
There are no officially published Docker images, so it is currently necessary to build the images from source.
Run the following commands to clone the source repository and build the Docker image.
git clone git@github.com:apache/arrow-ballista.git -b 0.12.0
cd arrow-ballista
./dev/build-ballista-docker.sh
This will create the following images:
apache/arrow-ballista-benchmarks:0.12.0
apache/arrow-ballista-cli:0.12.0
apache/arrow-ballista-executor:0.12.0
apache/arrow-ballista-scheduler:0.12.0
apache/arrow-ballista-standalone:0.12.0
Start a Cluster¶
Start a Scheduler¶
Start a scheduler using the following syntax:
docker run --network=host \
-d apache/arrow-ballista-scheduler:0.12.0 \
--bind-port 50050
Run docker ps
to check that the process is running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a756055576f3 apache/arrow-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson
Run docker logs CONTAINER_ID
to check the output from the process:
$ docker logs a756055576f3
Starting nginx to serve Ballista Scheduler web UI on port 80
2024-02-03T14:49:47.904571Z INFO main ThreadId(01) ballista_scheduler::cluster: Initializing Sled database in temp directory
nginx: [warn] duplicate value "error" in /etc/nginx/sites-enabled/default:49
nginx: [warn] duplicate value "non_idempotent" in /etc/nginx/sites-enabled/default:49
2024-02-03T14:49:47.924679Z INFO main ThreadId(01) ballista_scheduler::scheduler_process: Ballista v0.12.0 Scheduler listening on 0.0.0.0:50050
2024-02-03T14:49:47.924709Z INFO main ThreadId(01) ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task scheduling policy of PullStaged
2024-02-03T14:49:47.925261Z INFO main ThreadId(01) ballista_scheduler::cluster::kv: Initializing heartbeat listener
2024-02-03T14:49:47.925476Z INFO main ThreadId(01) ballista_scheduler::scheduler_server::query_stage_scheduler: Starting QueryStageScheduler
2024-02-03T14:49:47.925587Z INFO tokio-runtime-worker ThreadId(47) ballista_core::event_loop: Starting the event loop query_stage
Start Executors¶
Start one or more executor processes. Each executor process will need to listen on a different port.
docker run --network=host \
-d apache/arrow-ballista-executor:0.12.0 \
--external-host localhost --bind-port 50051
Use docker ps
to check that both the scheduler and executor(s) are now running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb8b530cee6d apache/arrow-ballista-executor:0.12.0 "/root/executor-entr…" 2 seconds ago Up 1 second gallant_galois
a756055576f3 apache/arrow-ballista-scheduler:0.12.0 "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson
Use docker logs CONTAINER_ID
to check the output from the executor(s):
$ docker logs fb8b530cee6d
2024-02-03T14:50:24.061607Z INFO main ThreadId(01) ballista_executor::executor_process: Running with config:
2024-02-03T14:50:24.061649Z INFO main ThreadId(01) ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
2024-02-03T14:50:24.061655Z INFO main ThreadId(01) ballista_executor::executor_process: concurrent_tasks: 48
2024-02-03T14:50:24.063256Z INFO tokio-runtime-worker ThreadId(44) ballista_executor::executor_process: Ballista v0.12.0 Rust Executor Flight Server listening on 0.0.0.0:50051
2024-02-03T14:50:24.063281Z INFO tokio-runtime-worker ThreadId(47) ballista_executor::execution_loop: Starting poll work loop with scheduler
Using etcd as a Backing Store¶
NOTE: This functionality is currently experimental
Ballista can optionally use etcd as a backing store for the scheduler. Use the following commands to launch the scheduler with this option enabled.
docker run --network=host \
-d apache/arrow-ballista-scheduler:0.12.0 \
--bind-port 50050 \
--config-backend etcd \
--etcd-urls etcd:2379
Please refer to the etcd website for installation instructions. Etcd version 3.4.9 or later is recommended.
Connect from the CLI¶
docker run --network=host -it apache/arrow-ballista-cli:0.12.0 --host localhost --port 50050