Expressions¶
DataFrame methods such as select
and filter
accept one or more logical expressions and there are many functions
available for creating logical expressions. These are documented below.
Expressions can be chained together using a fluent-style API:
// create the expression `(a > 5) AND (b < 7)`
col("a").gt(lit(5)).and(col("b").lt(lit(7)))
Identifiers¶
Function |
Notes |
---|---|
col |
Reference a column in a dataframe |
Literal Values¶
Function |
Notes |
---|---|
lit |
Literal value such as |
Boolean Expressions¶
Function |
Notes |
---|---|
and |
|
or |
|
not |
|
Comparison Expressions¶
Function |
Notes |
---|---|
eq |
|
gt |
|
gt_eq |
|
lt |
|
lt_eq |
|
not_eq |
|
Math Functions¶
In addition to the math functions listed here, some Rust operators are implemented for expressions, allowing
expressions such as col("a") + col("b")
to be used.
Function |
Notes |
---|---|
abs(x) |
absolute value |
acos(x) |
inverse cosine |
asin(x) |
inverse sine |
atan(x) |
inverse tangent |
atan2(y, x) |
inverse tangent of y / x |
ceil(x) |
nearest integer greater than or equal to argument |
cos(x) |
cosine |
exp(x) |
exponential |
floor(x) |
nearest integer less than or equal to argument |
ln(x) |
natural logarithm |
log10(x) |
base 10 logarithm |
log2(x) |
base 2 logarithm |
power(base, exponent) |
base raised to the power of exponent |
round(x) |
round to nearest integer |
signum(x) |
sign of the argument (-1, 0, +1) |
sin(x) |
sine |
sqrt(x) |
square root |
tan(x) |
tangent |
trunc(x) |
truncate toward zero |
Bitwise Operators¶
Operator |
Notes |
---|---|
& |
Bitwise AND => |
| |
Bitwise OR => |
# |
Bitwise XOR => |
<< |
Bitwise left shift => |
>> |
Bitwise right shift => |
Conditional Expressions¶
Function |
Notes |
---|---|
coalesce |
Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. |
case |
CASE expression. Example: |
nullif |
Returns a null value if |
String Expressions¶
Function |
Notes |
---|---|
ascii |
|
bit_length |
|
btrim |
|
char_length |
|
character_length |
|
concat |
|
concat_ws |
|
chr |
|
initcap |
|
left |
|
length |
|
lower |
|
lpad |
|
ltrim |
|
md5 |
|
octet_length |
|
repeat |
|
replace |
|
reverse |
|
right |
|
rpad |
|
rtrim |
|
digest |
|
split_part |
|
starts_with |
|
strpos |
|
substr |
|
translate |
|
trim |
|
upper |
Regular Expressions¶
Function |
Notes |
---|---|
regexp_match |
|
regexp_replace |
Temporal Expressions¶
Function |
Notes |
---|---|
date_part |
|
date_trunc |
|
from_unixtime |
|
to_timestamp |
|
to_timestamp_millis |
|
to_timestamp_micros |
|
to_timestamp_seconds |
|
now() |
current time |
Other Expressions¶
Function |
Notes |
---|---|
array |
|
in_list |
|
random |
|
sha224 |
|
sha256 |
|
sha384 |
|
sha512 |
|
struct |
|
to_hex |
Aggregate Functions¶
Function |
Notes |
---|---|
avg |
|
approx_distinct |
|
approx_median |
|
approx_percentile_cont |
|
approx_percentile_cont_with_weight |
|
count |
|
count_distinct |
|
cube |
|
grouping_set |
|
max |
|
median |
|
min |
|
rollup |
|
sum |
Subquery Expressions¶
Function |
Notes |
---|---|
exists |
|
in_subquery |
|
not_exists |
|
not_in_subquery |
|
scalar_subquery |
User-Defined Function Expressions¶
Function |
Notes |
---|---|
create_udf |
|
create_udaf |