Using FlightSQL to Connect to Ballista

One of the easiest ways to start with Ballista is to plug it into your existing data infrastructure using support for Arrow Flight SQL JDBC.

Getting started involves these main steps:

  1. Installing prerequisites

  2. Build the Ballista rust code

  3. Build and run the Ballista docker containers

  4. Build the Arrow Flight SQL JDBC Driver

  5. Install the driver into your favorite JDBC tool

  6. Run a “hello, world!” query

  7. Register a table and run more complicated queries

Prerequisites

Ubuntu

sudo apt-get update
sudo apt-get install -y docker.io docker-compose

MacOS

brew install docker docker-compose

Windows

choco install docker-desktop docker-compose

Building Ballista

To build in docker (non-linux systems):

git clone https://github.com/apache/arrow-ballista.git
dev/build-ballista-rust.sh

Or in linux-based systems with the correct dependencies installed, one can simply:

cargo build --release --all --features flight-sql

Run Docker Containers

docker-compose up --build

Build the FlightSQL JDBC Driver

Note: this will no longer be necessary when Arrow v10 is released approximately 2022-10-31

Note: A full explaination of the Arrow Java build is out-of-scope for this document. Please refer to that project for more detailed instructions.

git clone https://github.com/apache/arrow.git
cd java
mvn install -DskipTests -Dcheckstyle.skip -Drat.skip=true -pl :flight-sql-jdbc-driver -am
find . -name "*.jar"
...
./flight/flight-sql-jdbc-driver/target/flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar
...

Use the Driver in your Favorite Data Tool

The important pieces of information:

Key

Value

Driver file

flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar

Class Name

org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver

Authentication

User & Password

Username

admin

Password

password

Advanced Options

useEncryption=false

URL

jdbc:arrow-flight://127.0.0.1:50050

Run a “Hello, World!” Query

select 'Hello from Arrow Ballista!' as greeting;

Run a Complex Query

In order to run queries against data, tables need to be “registered” with the current session (and re-registered upon each new connection).

To register a table, find a .csv, .json, or .parquet file for testing, and use the syntax below:

create external table customer stored as CSV with header row
    location '/path/to/customer.csv';

Once the table has been registered, all the normal SQL queries can be performed:

select * from customer;

🎉 Happy querying! 🎉