Aggregate Functions¶
Aggregate functions operate on a set of values to compute a single result. Please refer to PostgreSQL for usage of standard SQL functions.
General¶
min
max
count
avg
sum
array_agg
Statistical¶
var / var_samp / var_pop
stddev / stddev_samp / stddev_pop
covar / covar_samp / covar_pop
corr
Approximate¶
approx_distinct¶
approx_distinct(x) -> uint64
returns the approximate number (HyperLogLog) of distinct input values
approx_median¶
approx_median(x) -> x
returns the approximate median of input values. it is an alias of approx_percentile_cont(x, 0.5)
.
approx_percentile_cont¶
approx_percentile_cont(x, p) -> x
return the approximate percentile (TDigest) of input values, where p
is a float64 between 0 and 1 (inclusive).
It supports raw data as input and build Tdigest sketches during query time, and is approximately equal to approx_percentile_cont_with_weight(x, 1, p)
.
approx_percentile_cont(x, p, n) -> x
return the approximate percentile (TDigest) of input values, where p
is a float64 between 0 and 1 (inclusive),
and n
(default 100) is the number of centroids in Tdigest which means that if there are n
or fewer unique values in x
, you can expect an exact result.
A higher value of n
results in a more accurate approximation and the cost of higher memory usage.
approx_percentile_cont_with_weight¶
approx_percentile_cont_with_weight(x, w, p) -> x
returns the approximate percentile (TDigest) of input values with weight, where w
is weight column expression and p
is a float64 between 0 and 1 (inclusive).
It supports raw data as input or pre-aggregated TDigest sketches, then builds or merges Tdigest sketches during query time. TDigest sketches are a list of centroid (x, w)
, where x
stands for mean and w
stands for weight.
It is suitable for low latency OLAP system where a streaming compute engine (e.g. Spark Streaming/Flink) pre-aggregates data to a data store, then queries using Datafusion.