Flight SQL Driver#
The Flight SQL Driver provides access to any database implementing a Arrow Flight SQL compatible endpoint.
Installation#
The Flight SQL driver is shipped as part of the Arrow C++ libraries and PyArrow. See the main Arrow project website for instructions. Version >= 11 is required.
Usage#
To connect to a database, supply the “uri” parameter when constructing
the AdbcDatabase
.
#include "adbc.h"
// Ignoring error handling
struct AdbcDatabase database;
AdbcDatabaseNew(&database, nullptr);
AdbcDatabaseSetOption(&database, "uri", "grpc://localhost:8080", nullptr);
AdbcDatabaseInit(&database, nullptr);
import pyarrow.flight_sql
with pyarrow.flight_sql.connect("grpc://localhost:8080") as conn:
pass
Supported Features#
The Flight SQL driver generally supports features defined in the ADBC API specification 1.0.0, as well as some additional, custom options.
Authentication#
The driver does no authentication by default.
The driver implements one optional authentication scheme that mimics
the Arrow Flight SQL JDBC driver. This can be enabled by setting the
option arrow.flight.sql.authorization_header
on the
AdbcDatabase
. The client provides credentials by setting
the option value to the value of the authorization
header sent
from client to server. The server then responds with an
authorization
header on the first request. The value of this
header will then be sent back as the authorization
header on all
future requests.
Bulk Ingestion#
Flight SQL does not have a dedicated API for bulk ingestion of Arrow data into a given table. The driver instead constructs SQL statements to create and insert into the table.
Warning
The driver does not escape or validate the names of tables or columns. As a precaution, it instead limits identifier names to letters, numbers, and underscores. Bulk ingestion should not be used with untrusted user input.
The driver binds a batch of data at a time for efficiency. Also, the
generated SQL statements hardcode ?
as the parameter identifier.
Client Options#
The options used for creating the Flight RPC client can be customized. These options map 1:1 with the options in FlightClientOptions:
arrow.flight.sql.client_option.tls_root_certs
Override the root certificates used to validate the server’s TLS certificate.
arrow.flight.sql.client_option.override_hostname
Override the hostname used to verify the server’s TLS certificate.
arrow.flight.sql.client_option.cert_chain
The certificate chain to use for mTLS.
arrow.flight.sql.client_option.private_key
The private key to use for mTLS.
arrow.flight.sql.client_option.generic_int_option.<OPTION_NAME>
Option prefixes used to specify generic transport-layer options.
arrow.flight.sql.client_option.generic_string_option.<OPTION_NAME>
Option prefixes used to specify generic transport-layer options.
arrow.flight.sql.client_option.disable_server_verification
Disable verification of the server’s TLS certificate. Value should be
true
orfalse
.
Custom Call Headers#
Custom HTTP headers can be attached to requests via options that apply
to AdbcDatabase
, AdbcConnection
, and
AdbcStatement
.
arrow.flight.sql.rpc.call_header.<HEADER NAME>
Add the header
<HEADER NAME>
to outgoing requests with the given value.Warning
Header names must be in all lowercase.
Distributed Result Sets#
The driver will fetch all partitions (FlightEndpoints) returned by the server, in an unspecified order (note that Flight SQL itself does not define an ordering on these partitions). If an endpoint has no locations, the data will be fetched using the original server connection. Else, the driver will try each location given, in order, until a request succeeds. If the connection or request fails, it will try the next location.
The driver does not currently cache or pool these secondary connections. It also does not retry connections or requests. Requests are made sequentially, one at a time—the driver does not parallelize requests or perform readahead.
Metadata#
The driver currently will not populate column constraint info (foreign
keys, primary keys, etc.) in AdbcConnectionGetObjects()
.
Also, catalog filters are evaluated as simple string matches, not
LIKE
-style patterns.
Partitioned Result Sets#
The Flight SQL driver supports ADBC’s partitioned result sets. When requested, each partition of a result set contains a serialized FlightInfo, containing one of the FlightEndpoints of the original response. Clients who may wish to introspect the partition can do so by deserializing the contained FlightInfo from the ADBC partitions. (For example, a client that wishes to distribute work across multiple workers or machines may want to try to take advantage of locality information that ADBC does not have.)
Timeouts#
By default, timeouts are not used for RPC calls. They can be set via
special options on AdbcConnection
. In general, it is
best practice to set timeouts to avoid unexpectedly getting stuck.
The options are as follows:
arrow.flight.sql.rpc.timeout_seconds.fetch
A timeout (in floating-point seconds) for any API calls that fetch data. This corresponds to Flight
DoGet
calls.For example, this controls the timeout of the underlying Flight calls that fetch more data as a result set is consumed.
arrow.flight.sql.rpc.timeout_seconds.query
A timeout (in floating-point seconds) for any API calls that execute a query. This corresponds to Flight
GetFlightInfo
calls.For example, this controls the timeout of the underlying Flight calls that implement
AdbcStatementExecuteQuery()
.arrow.flight.sql.rpc.timeout_seconds.update
A timeout (in floating-point seconds) for any API calls that upload data or perform other updates.
For example, this controls the timeout of the underlying Flight calls that implement bulk ingestion, or transaction support.
Transactions#
The driver will issue transaction RPCs, but the driver will not check the server’s SqlInfo to determine whether this is supported first.
Type Mapping#
When executing a bulk ingestion operation, the driver needs to be able
to construct appropriate SQL queries for the database. (The driver
does not currently support using Substrait plans instead.) In
particular, a mapping from Arrow types to SQL type names is required.
While a default mapping is provided, the client may wish to override
this mapping, which can be done by setting special options on
AdbcDatabase
. (The driver does not currently inspect
Flight SQL metadata to construct this mapping.)
All such options begin with arrow.flight.sql.quirks.ingest_type.
and are followed by a type name below.
Warning
The driver does not escape or validate the values here. They should not come from untrusted user input, or a SQL injection vulnerability may result.
Arrow Type Name |
Default SQL Type Name |
---|---|
binary |
BLOB |
bool |
BOOLEAN |
date32 |
DATE |
date64 |
DATE |
decimal128 |
NUMERIC |
decimal256 |
NUMERIC |
double |
DOUBLE PRECISION |
float |
REAL |
int16 |
SMALLINT |
int32 |
INT |
int64 |
BIGINT |
large_binary |
BLOB |
large_string |
TEXT |
string |
TEXT |
time32 |
TIME |
time64 |
TIME |
timestamp |
TIMESTAMP |