Expression API¶
DataFrame methods such as select
and filter
accept one or more logical expressions and there are many functions
available for creating logical expressions. These are documented below.
Expressions can be chained together using a fluent-style API:
// create the expression `(a > 6) AND (b < 7)`
col("a").gt(lit(6)).and(col("b").lt(lit(7)))
Identifiers¶
Function |
Notes |
---|---|
col |
Reference a column in a dataframe |
Literal Values¶
Function |
Notes |
---|---|
lit |
Literal value such as |
Boolean Expressions¶
Function |
Notes |
---|---|
and |
|
or |
|
not |
|
Bitwise expressions¶
Function |
Notes |
---|---|
bitwise_and |
|
bitwise_or |
|
bitwise_xor |
|
bitwise_shift_right |
|
bitwise_shift_left |
|
Comparison Expressions¶
Function |
Notes |
---|---|
eq |
|
gt |
|
gt_eq |
|
lt |
|
lt_eq |
|
not_eq |
|
Math Functions¶
In addition to the math functions listed here, some Rust operators are implemented for expressions, allowing
expressions such as col("a") + col("b")
to be used.
Function |
Notes |
---|---|
abs(x) |
absolute value |
acos(x) |
inverse cosine |
acosh(x) |
inverse hyperbolic cosine |
asin(x) |
inverse sine |
asinh(x) |
inverse hyperbolic sine |
atan(x) |
inverse tangent |
atanh(x) |
inverse hyperbolic tangent |
atan2(y, x) |
inverse tangent of y / x |
cbrt(x) |
cube root |
ceil(x) |
nearest integer greater than or equal to argument |
cos(x) |
cosine |
cosh(x) |
hyperbolic cosine |
degrees(x) |
converts radians to degrees |
exp(x) |
exponential |
factorial(x) |
factorial |
floor(x) |
nearest integer less than or equal to argument |
gcd(x, y) |
greatest common divisor |
isnan(x) |
predicate determining whether NaN/-NaN or not |
iszero(x) |
predicate determining whether 0.0/-0.0 or not |
lcm(x, y) |
least common multiple |
ln(x) |
natural logarithm |
log(base, x) |
logarithm of x for a particular base |
log10(x) |
base 10 logarithm |
log2(x) |
base 2 logarithm |
nanvl(x, y) |
returns x if x is not NaN otherwise returns y |
pi() |
approximate value of π |
power(base, exponent) |
base raised to the power of exponent |
radians(x) |
converts degrees to radians |
round(x) |
round to nearest integer |
signum(x) |
sign of the argument (-1, 0, +1) |
sin(x) |
sine |
sinh(x) |
hyperbolic sine |
sqrt(x) |
square root |
tan(x) |
tangent |
tanh(x) |
hyperbolic tangent |
trunc(x) |
truncate toward zero |
Math functions usage notes:¶
Unlike to some databases the math functions in Datafusion works the same way as Rust math functions, avoiding failing on corner cases e.g
❯ select log(-1), log(0), sqrt(-1);
+----------------+---------------+-----------------+
| log(Int64(-1)) | log(Int64(0)) | sqrt(Int64(-1)) |
+----------------+---------------+-----------------+
| NaN | -inf | NaN |
+----------------+---------------+-----------------+
Bitwise Operators¶
Operator |
Notes |
---|---|
& |
Bitwise AND => |
| |
Bitwise OR => |
# |
Bitwise XOR => |
<< |
Bitwise left shift => |
>> |
Bitwise right shift => |
Conditional Expressions¶
Function |
Notes |
---|---|
coalesce |
Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. |
case |
CASE expression. Example: |
nullif |
Returns a null value if |
String Expressions¶
Function |
Notes |
---|---|
ascii(character) |
Returns a numeric representation of the character ( |
bit_length(text) |
Returns the length of the string ( |
btrim(text, characters) |
Removes all specified characters ( |
char_length(text) |
Returns number of characters in the string ( |
character_length(text) |
Returns number of characters in the string ( |
concat(value1, [value2 [, …]]) |
Concatenates the text representations ( |
concat_ws(separator, value1, [value2 [, …]]) |
Concatenates the text representations ( |
chr(integer) |
Returns a character by its numeric representation ( |
initcap |
Converts the first letter of each word to upper case and the rest to lower case. Example: |
left(text, number) |
Returns a certain number ( |
length(text) |
Returns number of characters in the string ( |
lower(text) |
Converts all characters in the string ( |
lpad(text, length, [, fill]) |
Extends the string to length ( |
ltrim(text, text) |
Removes all specified characters ( |
md5(text) |
Computes the MD5 hash of the argument ( |
octet_length(text) |
Returns number of bytes in the string ( |
repeat(text, number) |
Repeats the string the specified number of times. Example: |
replace(string, from, to) |
Replaces a specified string ( |
reverse(text) |
Reverses the order of the characters in the string ( |
right(text, number) |
Returns a certain number ( |
rpad(text, length, [, fill]) |
Extends the string to length ( |
rtrim |
Removes all specified characters ( |
digest(input, algorithm) |
Computes the binary hash of |
split_part(string, delimiter, index) |
Splits the string ( |
starts_with(string, prefix) |
Returns |
strpos |
Finds the position from where the |
substr(string, position, [, length]) |
Returns substring from the position ( |
translate(string, from, to) |
Replaces the characters in |
trim(string) |
Removes all characters, space by default from the string ( |
upper |
Converts all characters in the string into upper case. Example: |
Array Expressions¶
Function |
Notes |
---|---|
array_append(array, element) |
Appends an element to the end of an array. |
array_concat(array[, …, array_n]) |
Concatenates arrays. |
array_has(array, element) |
Returns true if the array contains the element |
array_has_all(array, sub-array) |
Returns true if all elements of sub-array exist in array |
array_has_any(array, sub-array) |
Returns true if any elements exist in both arrays |
array_dims(array) |
Returns an array of the array’s dimensions. |
array_element(array, index) |
Extracts the element with the index n from the array |
flatten(array) |
Converts an array of arrays to a flat array |
array_length(array, dimension) |
Returns the length of the array dimension. |
array_ndims(array) |
Returns the number of dimensions of the array. |
array_pop_back(array) |
Returns the array without the last element. |
array_position(array, element) |
Searches for an element in the array, returns first occurrence. |
array_positions(array, element) |
Searches for an element in the array, returns all occurrences. |
array_prepend(array, element) |
Prepends an element to the beginning of an array. |
array_repeat(element, count) |
Returns an array containing element |
array_remove(array, element) |
Removes the first element from the array equal to the given value. |
array_remove_n(array, element, max) |
Removes the first |
array_remove_all(array, element) |
Removes all elements from the array equal to the given value. |
array_replace(array, from, to) |
Replaces the first occurrence of the specified element with another specified element. |
array_replace_n(array, from, to, max) |
Replaces the first |
array_replace_all(array, from, to) |
Replaces all occurrences of the specified element with another specified element. |
array_slice(array, index) |
Returns a slice of the array. |
array_to_string(array, delimiter) |
Converts each element to its text representation. |
cardinality(array) |
Returns the total number of elements in the array. |
make_array(value1, [value2 [, …]]) |
Returns an Arrow array using the specified input expressions. |
trim_array(array, n) |
Deprecated |
Regular Expressions¶
Function |
Notes |
---|---|
regexp_match |
Matches a regular expression against a string and returns matched substrings. |
regexp_replace |
Replaces strings that match a regular expression |
Temporal Expressions¶
Function |
Notes |
---|---|
date_part |
Extracts a subfield from the date. |
date_trunc |
Truncates the date to a specified level of precision. |
from_unixtime |
Returns the unix time in format. |
to_timestamp |
Converts a string to a |
to_timestamp_millis |
Converts a string to a |
to_timestamp_micros |
Converts a string to a |
to_timestamp_seconds |
Converts a string to a |
now() |
Returns current time. |
Other Expressions¶
Function |
Notes |
---|---|
array([value1, …]) |
Returns an array of fixed size with each argument ( |
in_list(expr, list, negated) |
Returns |
random() |
Returns a random value from 0 (inclusive) to 1 (exclusive). |
sha224(text) |
Computes the SHA224 hash of the argument ( |
sha256(text) |
Computes the SHA256 hash of the argument ( |
sha384(text) |
Computes the SHA384 hash of the argument ( |
sha512(text) |
Computes the SHA512 hash of the argument ( |
to_hex(integer) |
Converts the integer ( |
Aggregate Functions¶
Function |
Notes |
---|---|
avg(expr) |
Сalculates the average value for |
approx_distinct(expr) |
Calculates an approximate count of the number of distinct values for |
approx_median(expr) |
Calculates an approximation of the median for |
approx_percentile_cont(expr, percentile) |
Calculates an approximation of the specified |
approx_percentile_cont_with_weight(expr, weight_expr, percentile) |
Calculates an approximation of the specified |
bit_and(expr) |
Computes the bitwise AND of all non-null input values for |
bit_or(expr) |
Computes the bitwise OR of all non-null input values for |
bit_xor(expr) |
Computes the bitwise exclusive OR of all non-null input values for |
bool_and(expr) |
Returns true if all non-null input values ( |
bool_or(expr) |
Returns true if any non-null input value ( |
count(expr) |
Returns the number of rows for |
count_distinct |
Creates an expression to represent the count(distinct) aggregate function |
cube(exprs) |
Creates a grouping set for all combination of |
grouping_set(exprs) |
Create a grouping set. |
max(expr) |
Finds the maximum value of |
median(expr) |
Сalculates the median of |
min(expr) |
Finds the minimum value of |
rollup(exprs) |
Creates a grouping set for rollup sets. |
sum(expr) |
Сalculates the sum of |
Subquery Expressions¶
Function |
Notes |
---|---|
exists |
Creates an |
in_subquery |
|
not_exists |
Creates a |
not_in_subquery |
Creates a |
scalar_subquery |
Creates a scalar subquery expression |
User-Defined Function Expressions¶
Function |
Notes |
---|---|
create_udf |
Creates a new UDF with a specific signature and specific return type. |
create_udaf |
Creates a new UDAF with a specific signature, state type and return type. |