Compute Functions

Aggregations

all(array, *[, memory_pool, options, …])

Test whether all elements in a boolean array evaluate to true.

any(array, *[, memory_pool, options, …])

Test whether any element in a boolean array evaluates to true.

approximate_median(array, *[, memory_pool, …])

Approximate median of a numeric array with T-Digest algorithm.

count(array, *[, memory_pool, options, mode])

Count the number of null / non-null values.

count_distinct(array, *[, memory_pool, …])

Count the number of unique values.

index(data, value[, start, end, memory_pool])

Find the index of the first occurrence of a given value.

max(array, *[, memory_pool, options, …])

Compute the minimum or maximum values of a numeric array.

mean(array, *[, memory_pool, options, …])

Compute the mean of a numeric array.

min(array, *[, memory_pool, options, …])

Compute the minimum or maximum values of a numeric array.

min_max(array, *[, memory_pool, options, …])

Compute the minimum and maximum values of a numeric array.

mode(array[, n, skip_nulls, min_count])

Return top-n most common values and number of times they occur in a passed numerical (chunked) array, in descending order of occurrence.

product(array, *[, memory_pool, options, …])

Compute the product of values in a numeric array.

quantile(array, *[, memory_pool, options, …])

Compute an array of quantiles of a numeric array or chunked array.

stddev(array, *[, memory_pool, options, …])

Calculate the standard deviation of a numeric array.

sum(array, *[, memory_pool, options, …])

Compute the sum of a numeric array.

tdigest(array, *[, memory_pool, options, q, …])

Approximate quantiles of a numeric array with T-Digest algorithm.

variance(array, *[, memory_pool, options, …])

Calculate the variance of a numeric array.

Grouped Aggregations

hash_all(array, group_id_array, *[, …])

Test whether all elements evaluate to true.

hash_any(array, group_id_array, *[, …])

Test whether any element evaluates to true.

hash_approximate_median(array, group_id_array, *)

Calculate approximate medians of a numeric array with the T-Digest algorithm.

hash_count(array, group_id_array, *[, …])

Count the number of null / non-null values.

hash_count_distinct(array, group_id_array, *)

Count the distinct values in each group.

hash_distinct(array, group_id_array, *[, …])

Keep the distinct values in each group.

hash_max(array, group_id_array, *[, …])

Compute the minimum or maximum values of a numeric array.

hash_mean(array, group_id_array, *[, …])

Average values of a numeric array.

hash_min(array, group_id_array, *[, …])

Compute the minimum or maximum values of a numeric array.

hash_min_max(array, group_id_array, *[, …])

Compute the minimum and maximum values of a numeric array.

hash_product(array, group_id_array, *[, …])

Compute product of values of a numeric array.

hash_stddev(array, group_id_array, *[, …])

Calculate the standard deviation of a numeric array.

hash_sum(array, group_id_array, *[, …])

Sum values of a numeric array.

hash_tdigest(array, group_id_array, *[, …])

Calculate approximate quantiles of a numeric array with the T-Digest algorithm.

hash_variance(array, group_id_array, *[, …])

Calculate the variance of a numeric array.

Arithmetic Functions

By default these functions do not detect overflow. Most functions are also available in an overflow-checking variant, suffixed _checked, which throws an ArrowInvalid exception when overflow is detected.

abs(x, *[, memory_pool])

Calculate the absolute value of the argument element-wise.

abs_checked(x, *[, memory_pool])

Calculate the absolute value of the argument element-wise.

add(x, y, *[, memory_pool])

Add the arguments element-wise.

add_checked(x, y, *[, memory_pool])

Add the arguments element-wise.

divide(dividend, divisor, *[, memory_pool])

Divide the arguments element-wise.

divide_checked(dividend, divisor, *[, …])

Divide the arguments element-wise.

multiply(x, y, *[, memory_pool])

Multiply the arguments element-wise.

multiply_checked(x, y, *[, memory_pool])

Multiply the arguments element-wise.

negate(x, *[, memory_pool])

Negate the argument element-wise.

negate_checked(x, *[, memory_pool])

Negate the arguments element-wise.

power(base, exponent, *[, memory_pool])

Raise arguments to power element-wise.

power_checked(base, exponent, *[, memory_pool])

Raise arguments to power element-wise.

sign(x, *[, memory_pool])

Get the signedness of the arguments element-wise.

subtract(x, y, *[, memory_pool])

Subtract the arguments element-wise.

subtract_checked(x, y, *[, memory_pool])

Subtract the arguments element-wise.

Bit-wise Functions

bit_wise_and(x, y, *[, memory_pool])

Bit-wise AND the arguments element-wise.

bit_wise_not(x, *[, memory_pool])

Bit-wise negate the arguments element-wise.

bit_wise_or(x, y, *[, memory_pool])

Bit-wise OR the arguments element-wise.

bit_wise_xor(x, y, *[, memory_pool])

Bit-wise XOR the arguments element-wise.

shift_left(x, y, *[, memory_pool])

Left shift x by y.

shift_left_checked(x, y, *[, memory_pool])

Left shift x by y with invalid shift check.

shift_right(x, y, *[, memory_pool])

Right shift x by y.

shift_right_checked(x, y, *[, memory_pool])

Right shift x by y with invalid shift check.

Rounding Functions

Rounding functions displace numeric inputs to an approximate value with a simpler representation based on the rounding criterion.

ceil(x, *[, memory_pool])

Round up to the nearest integer.

floor(x, *[, memory_pool])

Round down to the nearest integer.

round(x, *[, memory_pool, options, ndigits, …])

Round to a given precision.

round_to_multiple(x, *[, memory_pool, …])

Round to a given multiple.

trunc(x, *[, memory_pool])

Get the integral part without fractional digits.

Logarithmic Functions

Logarithmic functions are also supported, and also offer _checked variants which detect domain errors.

ln(x, *[, memory_pool])

Compute natural log of arguments element-wise.

ln_checked(x, *[, memory_pool])

Compute natural log of arguments element-wise.

log10(x, *[, memory_pool])

Compute log base 10 of arguments element-wise.

log10_checked(x, *[, memory_pool])

Compute log base 10 of arguments element-wise.

log1p(x, *[, memory_pool])

Compute natural log of (1+x) element-wise.

log1p_checked(x, *[, memory_pool])

Compute natural log of (1+x) element-wise.

log2(x, *[, memory_pool])

Compute log base 2 of arguments element-wise.

log2_checked(x, *[, memory_pool])

Compute log base 2 of arguments element-wise.

logb(x, b, *[, memory_pool])

Compute log of x to base b of arguments element-wise.

logb_checked(x, b, *[, memory_pool])

Compute log of x to base b of arguments element-wise.

Trigonometric Functions

Trigonometric functions are also supported, and also offer _checked variants which detect domain errors where appropriate.

acos(x, *[, memory_pool])

Compute the inverse cosine of the elements argument-wise.

acos_checked(x, *[, memory_pool])

Compute the inverse cosine of the elements argument-wise.

asin(x, *[, memory_pool])

Compute the inverse sine of the elements argument-wise.

asin_checked(x, *[, memory_pool])

Compute the inverse sine of the elements argument-wise.

atan(x, *[, memory_pool])

Compute the principal value of the inverse tangent.

atan2(y, x, *[, memory_pool])

Compute the inverse tangent using argument signs to determine the quadrant.

cos(x, *[, memory_pool])

Compute the cosine of the elements argument-wise.

cos_checked(x, *[, memory_pool])

Compute the cosine of the elements argument-wise.

sin(x, *[, memory_pool])

Compute the sine of the elements argument-wise.

sin_checked(x, *[, memory_pool])

Compute the sine of the elements argument-wise.

tan(x, *[, memory_pool])

Compute the tangent of the elements argument-wise.

tan_checked(x, *[, memory_pool])

Compute the tangent of the elements argument-wise.

Comparisons

These functions expect two inputs of the same type. If one of the inputs is null they return null.

equal(x, y, *[, memory_pool])

Compare values for equality (x == y).

greater(x, y, *[, memory_pool])

Compare values for ordered inequality (x > y).

greater_equal(x, y, *[, memory_pool])

Compare values for ordered inequality (x >= y).

less(x, y, *[, memory_pool])

Compare values for ordered inequality (x < y).

less_equal(x, y, *[, memory_pool])

Compare values for ordered inequality (x <= y).

not_equal(x, y, *[, memory_pool])

Compare values for inequality (x != y).

These functions take any number of arguments of a numeric or temporal type.

max_element_wise(*args[, memory_pool, …])

Find the element-wise maximum value.

min_element_wise(*args[, memory_pool, …])

Find the element-wise minimum value.

Logical Functions

These functions normally emit a null when one of the inputs is null. However, Kleene logic variants are provided (suffixed _kleene). See User Guide for details.

and_(x, y, *[, memory_pool])

Logical ‘and’ boolean values.

and_kleene(x, y, *[, memory_pool])

Logical ‘and’ boolean values (Kleene logic).

and_not(x, y, *[, memory_pool])

Logical ‘and not’ boolean values.

and_not_kleene(x, y, *[, memory_pool])

Logical ‘and not’ boolean values (Kleene logic).

invert(values, *[, memory_pool])

Invert boolean values.

or_(x, y, *[, memory_pool])

Logical ‘or’ boolean values.

or_kleene(x, y, *[, memory_pool])

Logical ‘or’ boolean values (Kleene logic).

xor(x, y, *[, memory_pool])

Logical ‘xor’ boolean values.

String Predicates

In these functions an empty string emits false in the output. For ASCII variants (prefixed ascii_) a string element with non-ASCII characters emits false in the output.

The first set of functions emit true if the input contains only characters of a given class.

ascii_is_alnum(strings, *[, memory_pool])

Classify strings as ASCII alphanumeric.

ascii_is_alpha(strings, *[, memory_pool])

Classify strings as ASCII alphabetic.

ascii_is_decimal(strings, *[, memory_pool])

Classify strings as ASCII decimal.

ascii_is_lower(strings, *[, memory_pool])

Classify strings as ASCII lowercase.

ascii_is_printable(strings, *[, memory_pool])

Classify strings as ASCII printable.

ascii_is_space(strings, *[, memory_pool])

Classify strings as ASCII whitespace.

ascii_is_upper(strings, *[, memory_pool])

Classify strings as ASCII uppercase.

utf8_is_alnum(strings, *[, memory_pool])

Classify strings as alphanumeric.

utf8_is_alpha(strings, *[, memory_pool])

Classify strings as alphabetic.

utf8_is_decimal(strings, *[, memory_pool])

Classify strings as decimal.

utf8_is_digit(strings, *[, memory_pool])

Classify strings as digits.

utf8_is_lower(strings, *[, memory_pool])

Classify strings as lowercase.

utf8_is_numeric(strings, *[, memory_pool])

Classify strings as numeric.

utf8_is_printable(strings, *[, memory_pool])

Classify strings as printable.

utf8_is_space(strings, *[, memory_pool])

Classify strings as whitespace.

utf8_is_upper(strings, *[, memory_pool])

Classify strings as uppercase.

The second set of functions also consider the order of characters in the string element.

ascii_is_title(strings, *[, memory_pool])

Classify strings as ASCII titlecase.

utf8_is_title(strings, *[, memory_pool])

Classify strings as titlecase.

The third set of functions examines string elements on a byte-by-byte basis.

string_is_ascii(strings, *[, memory_pool])

Classify strings as ASCII.

String Transforms

ascii_capitalize(strings, *[, memory_pool])

Capitalize the first character of ASCII input.

ascii_lower(strings, *[, memory_pool])

Transform ASCII input to lowercase.

ascii_reverse(strings, *[, memory_pool])

Reverse ASCII input.

ascii_swapcase(strings, *[, memory_pool])

Transform ASCII input lowercase characters to uppercase and uppercase characters to lowercase.

ascii_title(strings, *[, memory_pool])

Titlecase each word of ASCII input.

ascii_upper(strings, *[, memory_pool])

Transform ASCII input to uppercase.

binary_length(strings, *[, memory_pool])

Compute string lengths.

binary_replace_slice(strings, *[, …])

Replace a slice of a binary string with replacement.

replace_substring(strings, *[, memory_pool, …])

Replace non-overlapping substrings that match pattern by replacement.

replace_substring_regex(strings, *[, …])

Replace non-overlapping substrings that match regex pattern by replacement.

utf8_capitalize(strings, *[, memory_pool])

Capitalize the first character of input.

utf8_length(strings, *[, memory_pool])

Compute UTF8 string lengths.

utf8_lower(strings, *[, memory_pool])

Transform input to lowercase.

utf8_replace_slice(strings, *[, …])

Replace a slice of a string with replacement.

utf8_reverse(strings, *[, memory_pool])

Reverse input.

utf8_swapcase(strings, *[, memory_pool])

Transform input lowercase characters to uppercase and uppercase characters to lowercase.

utf8_title(strings, *[, memory_pool])

Titlecase each word of input.

utf8_upper(strings, *[, memory_pool])

Transform input to uppercase.

String Padding

ascii_center(strings, *[, memory_pool, …])

For each string in strings, emit a centered string by padding both sides with the given UTF8 codeunit.

ascii_lpad(strings, *[, memory_pool, …])

For each string in strings, emit a right-aligned string by prepending the given UTF8 codeunit.

ascii_rpad(strings, *[, memory_pool, …])

For each string in strings, emit a left-aligned string by appending the given UTF8 codeunit.

utf8_center(strings, *[, memory_pool, …])

Center strings by padding with a given character.

utf8_lpad(strings, *[, memory_pool, …])

Right-align strings by padding with a given character.

utf8_rpad(strings, *[, memory_pool, …])

Left-align strings by padding with a given character.

String Trimming

ascii_ltrim(strings, *[, memory_pool, options])

Trim leading characters present in the characters arguments.

ascii_ltrim_whitespace(strings, *[, memory_pool])

Trim leading ASCII whitespace characters.

ascii_rtrim(strings, *[, memory_pool, options])

Trim trailing characters present in the characters arguments.

ascii_rtrim_whitespace(strings, *[, memory_pool])

Trim trailing ASCII whitespace characters.

ascii_trim(strings, *[, memory_pool, options])

Trim leading and trailing characters present in the characters arguments.

ascii_trim_whitespace(strings, *[, memory_pool])

Trim leading and trailing ASCII whitespace characters.

utf8_ltrim(strings, *[, memory_pool, options])

Trim leading characters present in the characters arguments.

utf8_ltrim_whitespace(strings, *[, memory_pool])

Trim leading whitespace characters.

utf8_rtrim(strings, *[, memory_pool, options])

Trim trailing characters present in the characters arguments.

utf8_rtrim_whitespace(strings, *[, memory_pool])

Trim trailing whitespace characters.

utf8_trim(strings, *[, memory_pool, options])

Trim leading and trailing characters present in the characters arguments.

utf8_trim_whitespace(strings, *[, memory_pool])

Trim leading and trailing whitespace characters.

String Splitting

ascii_split_whitespace(strings, *[, …])

Split string according to any ASCII whitespace.

split_pattern(strings, *[, memory_pool, …])

Split string according to separator.

split_pattern_regex(strings, *[, …])

Split string according to regex pattern.

utf8_split_whitespace(strings, *[, …])

Split string according to any Unicode whitespace.

String Component Extraction

extract_regex(strings, *[, memory_pool, options])

Extract substrings captured by a regex pattern.

String Joining

binary_join(list, separator, *[, memory_pool])

Join a list of strings together with a separator to form a single string.

binary_join_element_wise(*strings[, …])

Join string arguments into one, using the last argument as the separator.

String Slicing

utf8_slice_codeunits(strings, *[, …])

Slice string .

Containment Tests

count_substring(array, pattern, *[, ignore_case])

Count the occurrences of substring pattern in each value of a string array.

count_substring_regex(array, pattern, *[, …])

Count the non-overlapping matches of regex pattern in each value of a string array.

ends_with(strings, *[, memory_pool, …])

Match strings against literal pattern.

find_substring(array, pattern, *[, ignore_case])

Find the index of the first occurrence of substring pattern in each value of a string array.

find_substring_regex(array, pattern, *[, …])

Find the index of the first match of regex pattern in each value of a string array.

index_in(values, *[, memory_pool, options, …])

Return index of each element in a set of values.

is_in(values, *[, memory_pool, options, …])

Find each element in a set of values.

match_like(array, pattern, *[, ignore_case])

Test if the SQL-style LIKE pattern pattern matches a value of a string array.

match_substring(array, pattern, *[, ignore_case])

Test if substring pattern is contained within a value of a string array.

match_substring_regex(array, pattern, *[, …])

Test if regex pattern matches at any position a value of a string array.

starts_with(strings, *[, memory_pool, …])

Match strings against literal pattern.

Categorizations

is_finite(values, *[, memory_pool])

Return true if value is finite.

is_inf(values, *[, memory_pool])

Return true if infinity.

is_nan(values, *[, memory_pool])

Return true if NaN.

is_null(values, *[, memory_pool, options, …])

Return true if null (and optionally NaN).

is_valid(values, *[, memory_pool])

Return true if non-null.

Selecting / Multiplexing

case_when(cond, *cases[, memory_pool])

Choose values based on multiple conditions.

choose(indices, *values[, memory_pool])

Given indices and arrays, choose the value from the corresponding array for each index.

coalesce(*values[, memory_pool])

Select the first non-null value in each slot.

if_else(cond, left, right, *[, memory_pool])

Choose values based on a condition.

Conversions

cast(arr, target_type[, safe])

Cast array values to another data type.

strftime(timestamps, *[, memory_pool, …])

Format temporal values according to a format string.

strptime(strings, *[, memory_pool, options])

Parse timestamps.

Temporal Component Extraction

day(values, *[, memory_pool])

Extract day number.

day_of_week(values, *[, memory_pool, …])

Extract day of the week number.

day_of_year(values, *[, memory_pool])

Extract day of year number.

hour(values, *[, memory_pool])

Extract hour value.

iso_week(values, *[, memory_pool])

Extract ISO week of year number.

iso_year(values, *[, memory_pool])

Extract ISO year number.

iso_calendar(values, *[, memory_pool])

Extract (ISO year, ISO week, ISO day of week) struct.

microsecond(values, *[, memory_pool])

Extract microsecond values.

millisecond(values, *[, memory_pool])

Extract millisecond values.

minute(values, *[, memory_pool])

Extract minute values.

month(values, *[, memory_pool])

Extract month number.

nanosecond(values, *[, memory_pool])

Extract nanosecond values.

quarter(values, *[, memory_pool])

Extract quarter of year number.

second(values, *[, memory_pool])

Extract second values.

subsecond(values, *[, memory_pool])

Extract subsecond values.

us_week(values, *[, memory_pool])

Extract US week of year number.

week(values, *[, memory_pool, options, …])

Extract week of year number.

year(values, *[, memory_pool])

Extract year number.

Temporal Difference

day_time_interval_between(start, end, *[, …])

Compute the number of days and milliseconds between two timestamps.

days_between(start, end, *[, memory_pool])

Compute the number of days between two timestamps.

hours_between(start, end, *[, memory_pool])

Compute the number of hours between two timestamps.

microseconds_between(start, end, *[, …])

Compute the number of microseconds between two timestamps.

milliseconds_between(start, end, *[, …])

Compute the number of millisecond boundaries between two timestamps.

minutes_between(start, end, *[, memory_pool])

Compute the number of minute boundaries between two timestamps.

month_day_nano_interval_between(start, end, *)

Compute the number of months, days and nanoseconds between two timestamps.

month_interval_between(start, end, *[, …])

Compute the number of months between two timestamps.

nanoseconds_between(start, end, *[, memory_pool])

Compute the number of nanoseconds between two timestamps.

quarters_between(start, end, *[, memory_pool])

Compute the number of quarters between two timestamps.

seconds_between(start, end, *[, memory_pool])

Compute the number of seconds between two timestamps.

weeks_between(start, end, *[, memory_pool, …])

Compute the number of weeks between two timestamps.

years_between(start, end, *[, memory_pool])

Compute the number of years between two timestamps.

Timezone Handling

assume_timezone(timestamps, *[, …])

Convert naive timestamp to timezone-aware timestamp.

Associative Transforms

dictionary_encode(array, *[, memory_pool, …])

Dictionary-encode array.

unique(array, *[, memory_pool])

Compute unique elements.

value_counts(array, *[, memory_pool])

Compute counts of unique elements.

Selections

array_filter(array, selection_filter, *[, …])

Filter with a boolean selection filter.

array_take(array, indices, *[, memory_pool, …])

Select values from an array based on indices from another array.

drop_null(input, *[, memory_pool])

Drop nulls from the input.

filter(data, mask[, null_selection_behavior])

Select values (or records) from array- or table-like data given boolean filter, where true values are selected.

take(data, indices, *[, boundscheck, …])

Select values (or records) from array- or table-like data given integer selection indices.

Sorts and Partitions

array_sort_indices(array, *[, memory_pool, …])

Return the indices that would sort an array.

partition_nth_indices(array, *[, …])

Return the indices that would partition an array around a pivot.

select_k_unstable(input, *[, memory_pool, …])

Selects the indices of the first k ordered elements from the input.

sort_indices(input, *[, memory_pool, …])

Return the indices that would sort an array, record batch or table.

Structural Transforms

list_element(lists, index, *[, memory_pool])

Compute elements using of nested list values using an index.

list_flatten(lists, *[, memory_pool])

Flatten list values.

list_parent_indices(lists, *[, memory_pool])

Compute parent indices of nested list values.

list_value_length(lists, *[, memory_pool])

Compute list lengths.

make_struct(*args[, memory_pool, options, …])

Wrap Arrays into a StructArray.

replace_with_mask(values, mask, replacements, *)

Replace items using a mask and replacement values.