ADBC API Standard¶
This document summarizes the general featureset.
For C/C++ details, see adbc.h.
For Go details, see the source.
For Java details, see the source, particularly the package
org.apache.arrow.adbc.core
.
Databases¶
Databases hold state shared by multiple connections. Generally, this means common configuration and caches. For in-memory databases, it provides a place to hold ownership of the in-memory database.
C/C++:
AdbcDatabase
Go:
Driver
Connections¶
A connection is a single, logical connection to a database.
C/C++:
AdbcConnection
Go:
Connection
Autocommit¶
By default, connections are expected to operate in autocommit mode; that is, queries take effect immediately upon execution. This can be disabled in favor of manual commit/rollback calls, but not all implementations will support this.
C/C++:
ADBC_CONNECTION_OPTION_AUTOCOMMIT
Go:
OptionKeyAutoCommit
Java:
org.apache.arrow.adbc.core.AdbcConnection#setAutoCommit(boolean)
Metadata¶
ADBC exposes a variety of metadata about the database, such as what catalogs, schemas, and tables exist, the Arrow schema of tables, and so on.
Statistics¶
Note
Since API revision 1.1.0
ADBC exposes table/column statistics, such as the (unique) row count, min/max values, and so on. The goal here is to make ADBC work better in federation scenarios, where one query engine wants to read Arrow data from another database. Having statistics available lets the “outer” query planner make better choices about things like join order, or even decide to skip reading data entirely.
Statements¶
Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused, though doing so will invalidate prior result sets from that statement. (See Concurrency and Thread Safety.)
C/C++:
AdbcStatement
Go:
Statement
Java:
org.apache.arrow.adbc.core.AdbcStatement
Bulk Ingestion¶
ADBC provides explicit facilities to ingest batches of Arrow data into a database table. For databases which support it, this can avoid overheads from the typical bind-insert loop. Also, this (mostly) frees the user from knowing the right SQL syntax for their database.
C/C++:
ADBC_INGEST_OPTION_TARGET_TABLE
and related options.Go:
OptionKeyIngestTargetTable
Java:
org.apache.arrow.adbc.core.AdbcConnection#bulkIngest(String, org.apache.arrow.adbc.core.BulkIngestMode)
Cancellation¶
Note
Since API revision 1.1.0
Queries (and operations that implicitly represent queries, like fetching Statistics) can be cancelled.
Partitioned Result Sets¶
ADBC lets a driver explicitly expose partitioned and/or distributed result sets to clients. (This is similar to functionality in Flight RPC/Flight SQL.) Clients may take advantage of this to distribute computations on a result set across multiple threads, processes, or machines.
C/C++:
AdbcStatementExecutePartitions()
Go:
Statement.ExecutePartitions
Java:
org.apache.arrow.adbc.core.AdbcStatement#executePartitioned()
In principle, a vendor could return the results of partitioned execution as
they are available, instead of all at once. Incremental execution allows
drivers to expose this. When enabled, each call to ExecutePartitions
will
return available endpoints to read instead of blocking to retrieve all
endpoints.
Note
Since API revision 1.1.0
Lifecycle & Usage¶
Basic Usage¶
Consuming Result Sets¶
Bulk Data Ingestion¶
Update-only Queries (No Result Set)¶
Partitioned Execution¶
Error Handling¶
The error handling strategy varies by language.
In C, most methods take a AdbcError
. In Go, most methods return
an error that can be cast to an AdbcError
. In Java, most methods raise an
AdbcException
.
In all cases, an error contains:
A status code,
An error message,
An optional vendor code (a vendor-specific status code),
An optional 5-character “SQLSTATE” code (a SQL-like vendor-specific code).
Rich Error Metadata¶
Note
Since API revision 1.1.0
Drivers can expose additional rich error metadata. This can be used to return structured error information. For example, a driver could use something like the Googleapis ErrorDetails.
In C, Go and Java, AdbcError
, AdbcError
, and
AdbcException
respectively expose a list of additional metadata. For C,
see the documentation of AdbcError
to learn how the struct was
expanded while preserving ABI.
Changelog¶
Version 1.1.0¶
The info key ADBC_INFO_DRIVER_ADBC_VERSION can be used to retrieve the driver’s supported ADBC version.
The canonical options “uri”, “username”, and “password” were added to make configuration consistent between drivers.
Cancellation and the ability to both get and set options of different types were added. (Previously, you could set string options but could not get option values or get/set values of other types.) This can be used to get and set the current active catalog and/or schema through a pair of new canonical options.
Bulk Ingestion supports two additional modes:
“adbc.ingest.mode.replace” will drop existing data, then behave like “create”.
“adbc.ingest.mode.create_append” will behave like “create”, except if the table already exists, it will not error.
Rich Error Metadata has been added, allowing clients to get additional error metadata.
The ability to retrive table/column statistics was added. The goal here is to make ADBC work better in federation scenarios, where one query engine wants to read Arrow data from another database.
Incremental execution allows streaming partitions of a result set as they are available instead of blocking and waiting for query execution to finish before reading results.