ADBC: Arrow Database Connectivity

Rationale

The Arrow ecosystem lacks standard database interfaces built around Arrow data, especially for efficiently fetching large datasets (i.e. with minimal or no serialization and copying). Without a common API, the end result is a mix of custom protocols (e.g. BigQuery, Snowflake) and adapters (e.g. Turbodbc) scattered across languages. Consumers must laboriously wrap individual systems (as DBI is contemplating and Trino does with connectors).

ADBC aims to provide a minimal database client API standard, based on Arrow, for C, Go, and Java (with bindings for other languages). Applications code to this API standard (in much the same way as they would with JDBC or ODBC), but fetch result sets in Arrow format (e.g. via the C Data Interface). They then link to an implementation of the standard: either directly to a vendor-supplied driver for a particular database, or to a driver manager that abstracts across multiple drivers. Drivers implement the standard using a database-specific API, such as Flight SQL.

Goals

  • Provide a cross-language, Arrow-based API to standardize how clients submit queries to and fetch Arrow data from databases.

  • Support both SQL dialects and the emergent Substrait standard.

  • Support explicitly partitioned/distributed result sets to work better with contemporary distributed systems.

  • Allow for a variety of implementations to maximize reach.

Non-goals

  • Replacing JDBC/ODBC in all use cases, particularly OLTP use cases.

  • Requiring or enshrining a particular database protocol for the Arrow ecosystem.

Example use cases

A C or C++ application wishes to retrieve bulk data from a Postgres database for further analysis. The application is compiled against the ADBC header, and executes queries via the ADBC APIs. The application is linked against the ADBC libpq driver. At runtime, the driver submits queries to the database via the Postgres client libraries, and retrieves row-oriented results, which it then converts to Arrow format before returning them to the application.

If the application wishes to retrieve data from a database supporting Flight SQL instead, it would link against the ADBC Flight SQL driver. At runtime, the driver would submit queries via Flight SQL and get back Arrow data, which is then passed unchanged and uncopied to the application. (The application may have to edit the SQL queries, as ADBC does not translate between SQL dialects.)

If the application wishes to work with multiple databases, it would link against the ADBC driver manager, and specify the desired driver at runtime. The driver manager would pass on API calls to the correct driver, which handles the request.

ADBC API Standard 1.0.0

ADBC is a language-specific set of interface definitions that can be implemented directly by a vendor-specific “driver” or a vendor-neutral “driver manager”.

Version 1.0.0 of the standard corresponds to tag adbc-1.0.0 of the repository apache/arrow-adbc, which is commit f044edf5256abfb4c091b0ad2acc73afea2c93c0. Note that is is separate from releases of the actual implementations.

See the language-specific pages for details:

Updating this specification

ADBC is versioned separately from the core Arrow project. The API standard and components (driver manager, drivers) are also versioned separately, but both follow semantic versioning.

For example: components may make backwards-compatible releases as 1.0.0, 1.0.1, 1.1.0, 1.2.0, etc. They may release backwards-incompatible versions such as 2.0.0, but which still implement the API standard version 1.0.0.

Similarly, this documentation describes the ADBC API standard version 1.0.0. If/when an ABI-compatible revision is made (e.g. new standard options are defined), the next version would be 1.1.0. If incompatible changes are made (e.g. new API functions), the next version would be 2.0.0.