Using FlightSQL to Connect to Ballista¶
One of the easiest ways to start with Ballista is to plug it into your existing data infrastructure using support for Arrow Flight SQL JDBC.
Getting started involves these main steps:
Installing prerequisites
Build the Ballista rust code
Build and run the Ballista docker containers
Build the Arrow Flight SQL JDBC Driver
Install the driver into your favorite JDBC tool
Run a “hello, world!” query
Register a table and run more complicated queries
Prerequisites¶
Ubuntu¶
sudo apt-get update
sudo apt-get install -y docker.io docker-compose
MacOS¶
brew install docker docker-compose
Windows¶
choco install docker-desktop docker-compose
Building Ballista¶
To build in docker (non-linux systems):
git clone https://github.com/apache/arrow-ballista.git
dev/build-ballista-rust.sh
Or in linux-based systems with the correct dependencies installed, one can simply:
cargo build --release --all --features flight-sql
Run Docker Containers¶
docker-compose up --build
Build the FlightSQL JDBC Driver¶
Note: this will no longer be necessary when Arrow v10 is released approximately 2022-10-31
Note: A full explaination of the Arrow Java build is out-of-scope for this document. Please refer to that project for more detailed instructions.
git clone https://github.com/apache/arrow.git
cd java
mvn install -DskipTests -Dcheckstyle.skip -Drat.skip=true -pl :flight-sql-jdbc-driver -am
find . -name "*.jar"
...
./flight/flight-sql-jdbc-driver/target/flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar
...
Use the Driver in your Favorite Data Tool¶
The important pieces of information:
Key |
Value |
---|---|
Driver file |
flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar |
Class Name |
org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver |
Authentication |
User & Password |
Username |
admin |
Password |
password |
Advanced Options |
useEncryption=false |
URL |
jdbc:arrow-flight://127.0.0.1:50050 |
Run a “Hello, World!” Query¶
select 'Hello from Arrow Ballista!' as greeting;
Run a Complex Query¶
In order to run queries against data, tables need to be “registered” with the current session (and re-registered upon each new connection).
To register a table, find a .csv
, .json
, or .parquet
file for testing, and use the syntax below:
create external table customer stored as CSV with header row
location '/path/to/customer.csv';
Once the table has been registered, all the normal SQL queries can be performed:
select * from customer;
🎉 Happy querying! 🎉