Subqueries

DataFusion supports EXISTS, NOT EXISTS, IN, NOT IN and Scalar Subqueries.

The examples below are based on the following table.

select * from x;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+

EXISTS

The EXISTS syntax can be used to find all rows in a relation where a correlated subquery produces one or more matches for that row. Only correlated subqueries are supported.

select * from x y where exists (select * from x where x.column_1 = y.column_1);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
1 row in set.

NOT EXISTS

The NOT EXISTS syntax can be used to find all rows in a relation where a correlated subquery produces zero matches for that row. Only correlated subqueries are supported.

select * from x y where not exists (select * from x where x.column_1 = y.column_1);
0 rows in set.

IN

The IN syntax can be used to find all rows in a relation where a given expression’s value can be found in the results of a correlated subquery.

select * from x where column_1 in (select column_1 from x);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
1 row in set.

NOT IN

The NOT IN syntax can be used to find all rows in a relation where a given expression’s value can not be found in the results of a correlated subquery.

select * from x where column_1 not in (select column_1 from x);
0 rows in set.

Scalar Subquery

A scalar subquery can be used to produce a single value that can be used in many different contexts in a query. Here is an example of a filter using a scalar subquery. Only correlated subqueries are supported.

select * from x y where column_1 < (select sum(column_2) from x where x.column_1 = y.column_1);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
1 row in set.