This will do the necessary configuration to create a (virtual) table in DuckDB that is backed by the Arrow object given. No data is copied or modified until collect() or compute() are called or a query is run against the table.

to_duckdb(
  .data,
  con = arrow_duck_connection(),
  table_name = unique_arrow_tablename(),
  auto_disconnect = TRUE
)

Arguments

.data

the Arrow object (e.g. Dataset, Table) to use for the DuckDB table

con

a DuckDB connection to use (default will create one and store it in options("arrow_duck_con"))

table_name

a name to use in DuckDB for this object. The default is a unique string "arrow_" followed by numbers.

auto_disconnect

should the table be automatically cleaned up when the resulting object is removed (and garbage collected)? Default: FALSE

Value

A tbl of the new table in DuckDB

Details

The result is a dbplyr-compatible object that can be used in d(b)plyr pipelines.

If auto_disconnect = TRUE, the DuckDB table that is created will be configured to be unregistered when the tbl object is garbage collected. This is helpful if you don't want to have extra table objects in DuckDB after you've finished using them. Currently, this cleanup can, however, sometimes lead to hangs if tables are created and deleted in quick succession, hence the default value of FALSE

Examples

library(dplyr)

ds <- InMemoryDataset$create(mtcars)

ds %>%
  filter(mpg < 30) %>%
  to_duckdb() %>%
  group_by(cyl) %>%
  summarize(mean_mpg = mean(mpg, na.rm = TRUE))
#> # Source:   lazy query [?? x 2]
#> # Database: duckdb_connection
#>     cyl mean_mpg
#>   <dbl>    <dbl>
#> 1     6     19.7
#> 2     4     23.7
#> 3     8     15.1