Status

General

  • SQL Parser

  • SQL Query Planner

  • Query Optimizer

  • Constant folding

  • Join Reordering

  • Limit Pushdown

  • Projection push down

  • Predicate push down

  • Type coercion

  • Parallel query execution

SQL Support

  • Projection (SELECT)

  • Filter (WHERE)

  • Filter post-aggregate (HAVING)

  • Sorting (ORDER BY)

  • Limit (LIMIT

  • Aggregate (GROUP BY)

  • cast /try_cast

  • VALUES lists

  • String Functions

  • Conditional Functions

  • Time and Date Functions

  • Math Functions

  • Aggregate Functions (SUM, MEDIAN, and many more)

  • Schema Queries

  • Support for nested types (ARRAY/LIST and STRUCT. See #2326 for details)

  • Subqueries

  • Common Table Expressions (CTE)

  • Set Operations (UNION [ALL], INTERSECT [ALL], EXCEPT[ALL])

  • Joins (INNER, LEFT, RIGHT, FULL, CROSS)

  • Window Functions

    • Empty (OVER())

    • Partitioning and ordering: (OVER(PARTITION BY <..> ORDER BY <..>))

    • Custom Window (ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING))

    • User Defined Window and Aggregate Functions

  • Catalogs

    • Schemas (CREATE / DROP SCHEMA)

    • Tables (CREATE / DROP TABLE, CREATE TABLE AS SELECT)

  • Data Insert

    • INSERT INTO

    • COPY .. INTO ..

    • CSV

    • JSON

    • Parquet

    • Avro

Runtime

  • Streaming Grouping

  • Streaming Window Evaluation

  • Memory limits enforced

  • Spilling (to disk) Sort

  • Spilling (to disk) Grouping

  • Spilling (to disk) Joins

Data Sources

In addition to allowing arbitrary datasources via the TableProvider trait, DataFusion includes built in support for the following formats:

  • CSV

  • Parquet (for all primitive and nested types)

  • JSON

  • Avro

  • Arrow