Roadmap

A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.

2022 Q1

DataFusion Core

  • Publish official Arrow2 branch

  • Implementation of memory manager (i.e. to enable spilling to disk as needed)

Benchmarking

  • Inclusion in Db-Benchmark with all quries covered

  • All TPCH queries covered

Performance Improvements

  • Predicate evaluation

  • Improve multi-column comparisons (that can’t be vectorized at the moment)

  • Null constant support

New Features

  • Read JSON as table

  • Simplify DDL with Datafusion-Cli

  • Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support

  • Add new experimental e-graph based optimizer

Ballista

  • Begin work on design documents and plan / priorities for development

Extensions (datafusion-contrib)

  • Stable S3 support

  • Begin design discussions and prototyping of a stream provider

Beyond 2022 Q1

There is no clear timeline for the below, but community members have expressed interest in working on these topics.

DataFusion Core

  • Custom SQL support

  • Split DataFusion into multiple crates

  • Push based query execution and code generation

Ballista

  • Evolve architecture so that it can be deployed in a multi-tenant cloud native environment

  • Ensure Ballista is scalable, elastic, and stable for production usage

  • Develop distributed ML capabilities