ADBC API Standard#

This document summarizes the general featureset.

  • For C/C++ details, see adbc.h.

  • For Go details, see the source.

  • For Java details, see the source.

Databases#

Databases hold state shared by multiple connections. Generally, this means common configuration and caches. For in-memory databases, it provides a place to hold ownership of the in-memory database.

  • C/C++: AdbcDatabase

  • Go: Driver

  • Java: org.apache.arrow.adbc.core.AdbcDatabase

Connections#

A connection is a single, logical connection to a database.

  • C/C++: AdbcConnection

  • Go: Connection

  • Java: org.apache.arrow.adbc.core.AdbcConnection

Autocommit#

By default, connections are expected to operate in autocommit mode; that is, queries take effect immediately upon execution. This can be disabled in favor of manual commit/rollback calls, but not all implementations will support this.

  • C/C++: ADBC_CONNECTION_OPTION_AUTOCOMMIT

  • Go: OptionKeyAutoCommit

  • Java: org.apache.arrow.adbc.core.AdbcConnection#setAutoCommit(boolean)

Metadata#

ADBC exposes a variety of metadata about the database, such as what catalogs, schemas, and tables exist, the Arrow schema of tables, and so on.

Statistics#

Note

Since API revision 1.1.0

ADBC exposes table/column statistics, such as the (unique) row count, min/max values, and so on. The goal here is to make ADBC work better in federation scenarios, where one query engine wants to read Arrow data from another database. Having statistics available lets the “outer” query planner make better choices about things like join order, or even decide to skip reading data entirely.

Statements#

Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused, though doing so will invalidate prior result sets from that statement. (See Concurrency and Thread Safety.)

  • C/C++: AdbcStatement

  • Go: Statement

  • Java: org.apache.arrow.adbc.core.AdbcStatement

Bulk Ingestion#

ADBC provides explicit facilities to ingest batches of Arrow data into a database table. For databases which support it, this can avoid overheads from the typical bind-insert loop. Also, this (mostly) frees the user from knowing the right SQL syntax for their database.

  • C/C++: ADBC_INGEST_OPTION_TARGET_TABLE and related options.

  • Go: OptionKeyIngestTargetTable

  • Java: org.apache.arrow.adbc.core.AdbcConnection#bulkIngest(String, org.apache.arrow.adbc.core.BulkIngestMode)

Cancellation#

Note

Since API revision 1.1.0

Queries (and operations that implicitly represent queries, like fetching Statistics) can be cancelled.

Partitioned Result Sets#

ADBC lets a driver explicitly expose partitioned and/or distributed result sets to clients. (This is similar to functionality in Flight RPC/Flight SQL.) Clients may take advantage of this to distribute computations on a result set across multiple threads, processes, or machines.

  • C/C++: AdbcStatementExecutePartitions()

  • Go: Statement.ExecutePartitions

  • Java: org.apache.arrow.adbc.core.AdbcStatement#executePartitioned()

In principle, a vendor could return the results of partitioned execution as they are available, instead of all at once. Incremental execution allows drivers to expose this. When enabled, each call to ExecutePartitions will return available endpoints to read instead of blocking to retrieve all endpoints.

Note

Since API revision 1.1.0

Lifecycle & Usage#

The lifecycle of a statement.

Basic Usage#

../_images/AdbcStatementBasicUsage.mmd.svg

Preparing the statement and binding parameters are optional.#

Consuming Result Sets#

../_images/AdbcStatementConsumeResultSet.mmd.svg

This is equivalent to reading from what many Arrow libraries call a RecordBatchReader.#

Bulk Data Ingestion#

../_images/AdbcStatementBulkIngest.mmd.svg

There is no need to prepare the statement.#

Update-only Queries (No Result Set)#

../_images/AdbcStatementUpdate.mmd.svg

Preparing the statement and binding parameters are optional.#

Partitioned Execution#

../_images/AdbcStatementPartitioned.mmd.svg

This is similar to fetching data in Arrow Flight RPC (by design). See “Downloading Data”.#

Error Handling#

The error handling strategy varies by language.

In C, most methods take a AdbcError. In Go, most methods return an error that can be cast to an AdbcError. In Java, most methods raise an AdbcException.

In all cases, an error contains:

  • A status code,

  • An error message,

  • An optional vendor code (a vendor-specific status code),

  • An optional 5-character “SQLSTATE” code (a SQL-like vendor-specific code).

Rich Error Metadata#

Note

Since API revision 1.1.0

Drivers can expose additional rich error metadata. This can be used to return structured error information. For example, a driver could use something like the Googleapis ErrorDetails.

In C, Go and Java, AdbcError, AdbcError, and AdbcException respectively expose a list of additional metadata. For C, see the documentation of AdbcError to learn how the struct was expanded while preserving ABI.

Changelog#

Version 1.1.0#

The info key ADBC_INFO_DRIVER_ADBC_VERSION can be used to retrieve the driver’s supported ADBC version.

The canonical options “uri”, “username”, and “password” were added to make configuration consistent between drivers.

Cancellation and the ability to both get and set options of different types were added. (Previously, you could set string options but could not get option values or get/set values of other types.) This can be used to get and set the current active catalog and/or schema through a pair of new canonical options.

Bulk Ingestion supports two additional modes:

  • “adbc.ingest.mode.replace” will drop existing data, then behave like “create”.

  • “adbc.ingest.mode.create_append” will behave like “create”, except if the table already exists, it will not error.

Rich Error Metadata has been added, allowing clients to get additional error metadata.

The ability to retrive table/column statistics was added. The goal here is to make ADBC work better in federation scenarios, where one query engine wants to read Arrow data from another database.

Incremental execution allows streaming partitions of a result set as they are available instead of blocking and waiting for query execution to finish before reading results.