# Aggregate Functions¶

Aggregate functions operate on a set of values to compute a single result.

## General¶

### avg¶

Returns the average of numeric values in the specified column.

avg(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• mean

### bit_and¶

Computes the bitwise AND of all non-null input values.

bit_and(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### bit_or¶

Computes the bitwise OR of all non-null input values.

bit_or(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### bit_xor¶

Computes the bitwise exclusive OR of all non-null input values.

bit_xor(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### bool_and¶

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### bool_or¶

Returns true if any non-null input value is true, otherwise false.

bool_or(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### count¶

Returns the number of rows in the specified column.

Count includes null values in the total count. To exclude null values from the total count, include <column> IS NOT NULL in the WHERE clause.

count(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### max¶

Returns the maximum value in the specified column.

max(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Alias of avg.

### median¶

Returns the median value in the specified column.

median(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### min¶

Returns the minimum value in the specified column.

min(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### sum¶

Returns the sum of all values in the specified column.

sum(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### array_agg¶

Returns an array created from the expression elements. If ordering requirement is given, elements are inserted in the order of required ordering.

array_agg(expression [ORDER BY expression])

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### first_value¶

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### last_value¶

Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

last_value(expression [ORDER BY expression])

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

## Statistical¶

### corr¶

Returns the coefficient of correlation between two numeric values.

corr(expression1, expression2)

#### Arguments¶

• expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### covar¶

Returns the covariance of a set of number pairs.

covar(expression1, expression2)

#### Arguments¶

• expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### covar_pop¶

Returns the population covariance of a set of number pairs.

covar_pop(expression1, expression2)

#### Arguments¶

• expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### covar_samp¶

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

#### Arguments¶

• expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### stddev¶

Returns the standard deviation of a set of numbers.

stddev(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### stddev_pop¶

Returns the population standard deviation of a set of numbers.

stddev_pop(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### stddev_samp¶

Returns the sample standard deviation of a set of numbers.

stddev_samp(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### var¶

Returns the statistical variance of a set of numbers.

var(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### var_pop¶

Returns the statistical population variance of a set of numbers.

var_pop(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### var_samp¶

Returns the statistical sample variance of a set of numbers.

var_samp(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_slope¶

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.

regr_slope(expression1, expression2)

#### Arguments¶

• expression_y: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_avgx¶

Computes the average of the independent variable (input) expression_x for the non-null paired data points.

regr_avgx(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_avgy¶

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.

regr_avgy(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_count¶

Counts the number of non-null paired data points.

regr_count(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_intercept¶

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.

regr_intercept(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_r2¶

Computes the square of the correlation coefficient between the independent and dependent variables.

regr_r2(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_sxx¶

Computes the sum of squares of the independent variable.

regr_sxx(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_syy¶

Computes the sum of squares of the dependent variable.

regr_syy(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

### regr_sxy¶

Computes the sum of products of paired data points.

regr_sxy(expression_y, expression_x)

#### Arguments¶

• expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

• expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

## Approximate¶

### approx_distinct¶

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### approx_median¶

Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5).

approx_median(expression)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

### approx_percentile_cont¶

Returns the approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

• centroids: Number of centroids to use in the t-digest algorithm. Default is 100.

If there are this number or fewer unique values, you can expect an exact result. A higher number of centroids results in a more accurate approximation, but requires more memory to compute.

### approx_percentile_cont_with_weight¶

Returns the weighted approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)

#### Arguments¶

• expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

• weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.

• percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).