Compute Functions#

Aggregations#

all(array, /, *[, skip_nulls, min_count, ...])

Test whether all elements in a boolean array evaluate to true.

any(array, /, *[, skip_nulls, min_count, ...])

Test whether any element in a boolean array evaluates to true.

approximate_median(array, /, *[, ...])

Approximate median of a numeric array with T-Digest algorithm.

count(array, /[, mode, options, memory_pool])

Count the number of null / non-null values.

count_distinct(array, /[, mode, options, ...])

Count the number of unique values.

index(data, value[, start, end, memory_pool])

Find the index of the first occurrence of a given value.

max(array, /, *[, skip_nulls, min_count, ...])

Compute the minimum or maximum values of a numeric array.

mean(array, /, *[, skip_nulls, min_count, ...])

Compute the mean of a numeric array.

min(array, /, *[, skip_nulls, min_count, ...])

Compute the minimum or maximum values of a numeric array.

min_max(array, /, *[, skip_nulls, ...])

Compute the minimum and maximum values of a numeric array.

mode(array, /[, n, skip_nulls, min_count, ...])

Compute the modal (most common) values of a numeric array.

product(array, /, *[, skip_nulls, ...])

Compute the product of values in a numeric array.

quantile(array, /[, q, interpolation, ...])

Compute an array of quantiles of a numeric array or chunked array.

stddev(array, /, *[, ddof, skip_nulls, ...])

Calculate the standard deviation of a numeric array.

sum(array, /, *[, skip_nulls, min_count, ...])

Compute the sum of a numeric array.

tdigest(array, /[, q, delta, buffer_size, ...])

Approximate quantiles of a numeric array with T-Digest algorithm.

variance(array, /, *[, ddof, skip_nulls, ...])

Calculate the variance of a numeric array.

Arithmetic Functions#

By default these functions do not detect overflow. Most functions are also available in an overflow-checking variant, suffixed _checked, which throws an ArrowInvalid exception when overflow is detected.

abs(x, /, *[, memory_pool])

Calculate the absolute value of the argument element-wise.

abs_checked(x, /, *[, memory_pool])

Calculate the absolute value of the argument element-wise.

add(x, y, /, *[, memory_pool])

Add the arguments element-wise.

add_checked(x, y, /, *[, memory_pool])

Add the arguments element-wise.

divide(dividend, divisor, /, *[, memory_pool])

Divide the arguments element-wise.

divide_checked(dividend, divisor, /, *[, ...])

Divide the arguments element-wise.

multiply(x, y, /, *[, memory_pool])

Multiply the arguments element-wise.

multiply_checked(x, y, /, *[, memory_pool])

Multiply the arguments element-wise.

negate(x, /, *[, memory_pool])

Negate the argument element-wise.

negate_checked(x, /, *[, memory_pool])

Negate the arguments element-wise.

power(base, exponent, /, *[, memory_pool])

Raise arguments to power element-wise.

power_checked(base, exponent, /, *[, ...])

Raise arguments to power element-wise.

sign(x, /, *[, memory_pool])

Get the signedness of the arguments element-wise.

sqrt(x, /, *[, memory_pool])

Takes the square root of arguments element-wise.

sqrt_checked(x, /, *[, memory_pool])

Takes the square root of arguments element-wise.

subtract(x, y, /, *[, memory_pool])

Subtract the arguments element-wise.

subtract_checked(x, y, /, *[, memory_pool])

Subtract the arguments element-wise.

Bit-wise Functions#

bit_wise_and(x, y, /, *[, memory_pool])

Bit-wise AND the arguments element-wise.

bit_wise_not(x, /, *[, memory_pool])

Bit-wise negate the arguments element-wise.

bit_wise_or(x, y, /, *[, memory_pool])

Bit-wise OR the arguments element-wise.

bit_wise_xor(x, y, /, *[, memory_pool])

Bit-wise XOR the arguments element-wise.

shift_left(x, y, /, *[, memory_pool])

Left shift x by y.

shift_left_checked(x, y, /, *[, memory_pool])

Left shift x by y.

shift_right(x, y, /, *[, memory_pool])

Right shift x by y.

shift_right_checked(x, y, /, *[, memory_pool])

Right shift x by y.

Rounding Functions#

Rounding functions displace numeric inputs to an approximate value with a simpler representation based on the rounding criterion.

ceil(x, /, *[, memory_pool])

Round up to the nearest integer.

floor(x, /, *[, memory_pool])

Round down to the nearest integer.

round(x, /[, ndigits, round_mode, options, ...])

Round to a given precision.

round_to_multiple(x, /[, multiple, ...])

Round to a given multiple.

trunc(x, /, *[, memory_pool])

Compute the integral part.

Logarithmic Functions#

Logarithmic functions are also supported, and also offer _checked variants which detect domain errors.

ln(x, /, *[, memory_pool])

Compute natural logarithm.

ln_checked(x, /, *[, memory_pool])

Compute natural logarithm.

log10(x, /, *[, memory_pool])

Compute base 10 logarithm.

log10_checked(x, /, *[, memory_pool])

Compute base 10 logarithm.

log1p(x, /, *[, memory_pool])

Compute natural log of (1+x).

log1p_checked(x, /, *[, memory_pool])

Compute natural log of (1+x).

log2(x, /, *[, memory_pool])

Compute base 2 logarithm.

log2_checked(x, /, *[, memory_pool])

Compute base 2 logarithm.

logb(x, b, /, *[, memory_pool])

Compute base b logarithm.

logb_checked(x, b, /, *[, memory_pool])

Compute base b logarithm.

Trigonometric Functions#

Trigonometric functions are also supported, and also offer _checked variants which detect domain errors where appropriate.

acos(x, /, *[, memory_pool])

Compute the inverse cosine.

acos_checked(x, /, *[, memory_pool])

Compute the inverse cosine.

asin(x, /, *[, memory_pool])

Compute the inverse sine.

asin_checked(x, /, *[, memory_pool])

Compute the inverse sine.

atan(x, /, *[, memory_pool])

Compute the inverse tangent of x.

atan2(y, x, /, *[, memory_pool])

Compute the inverse tangent of y/x.

cos(x, /, *[, memory_pool])

Compute the cosine.

cos_checked(x, /, *[, memory_pool])

Compute the cosine.

sin(x, /, *[, memory_pool])

Compute the sine.

sin_checked(x, /, *[, memory_pool])

Compute the sine.

tan(x, /, *[, memory_pool])

Compute the tangent.

tan_checked(x, /, *[, memory_pool])

Compute the tangent.

Comparisons#

These functions expect two inputs of the same type. If one of the inputs is null they return null.

equal(x, y, /, *[, memory_pool])

Compare values for equality (x == y).

greater(x, y, /, *[, memory_pool])

Compare values for ordered inequality (x > y).

greater_equal(x, y, /, *[, memory_pool])

Compare values for ordered inequality (x >= y).

less(x, y, /, *[, memory_pool])

Compare values for ordered inequality (x < y).

less_equal(x, y, /, *[, memory_pool])

Compare values for ordered inequality (x <= y).

not_equal(x, y, /, *[, memory_pool])

Compare values for inequality (x != y).

These functions take any number of arguments of a numeric or temporal type.

max_element_wise(*args[, skip_nulls, ...])

Find the element-wise maximum value.

min_element_wise(*args[, skip_nulls, ...])

Find the element-wise minimum value.

Logical Functions#

These functions normally emit a null when one of the inputs is null. However, Kleene logic variants are provided (suffixed _kleene). See User Guide for details.

and_(x, y, /, *[, memory_pool])

Logical 'and' boolean values.

and_kleene(x, y, /, *[, memory_pool])

Logical 'and' boolean values (Kleene logic).

and_not(x, y, /, *[, memory_pool])

Logical 'and not' boolean values.

and_not_kleene(x, y, /, *[, memory_pool])

Logical 'and not' boolean values (Kleene logic).

invert(values, /, *[, memory_pool])

Invert boolean values.

or_(x, y, /, *[, memory_pool])

Logical 'or' boolean values.

or_kleene(x, y, /, *[, memory_pool])

Logical 'or' boolean values (Kleene logic).

xor(x, y, /, *[, memory_pool])

Logical 'xor' boolean values.

String Predicates#

In these functions an empty string emits false in the output. For ASCII variants (prefixed ascii_) a string element with non-ASCII characters emits false in the output.

The first set of functions emit true if the input contains only characters of a given class.

ascii_is_alnum(strings, /, *[, memory_pool])

Classify strings as ASCII alphanumeric.

ascii_is_alpha(strings, /, *[, memory_pool])

Classify strings as ASCII alphabetic.

ascii_is_decimal(strings, /, *[, memory_pool])

Classify strings as ASCII decimal.

ascii_is_lower(strings, /, *[, memory_pool])

Classify strings as ASCII lowercase.

ascii_is_printable(strings, /, *[, memory_pool])

Classify strings as ASCII printable.

ascii_is_space(strings, /, *[, memory_pool])

Classify strings as ASCII whitespace.

ascii_is_upper(strings, /, *[, memory_pool])

Classify strings as ASCII uppercase.

utf8_is_alnum(strings, /, *[, memory_pool])

Classify strings as alphanumeric.

utf8_is_alpha(strings, /, *[, memory_pool])

Classify strings as alphabetic.

utf8_is_decimal(strings, /, *[, memory_pool])

Classify strings as decimal.

utf8_is_digit(strings, /, *[, memory_pool])

Classify strings as digits.

utf8_is_lower(strings, /, *[, memory_pool])

Classify strings as lowercase.

utf8_is_numeric(strings, /, *[, memory_pool])

Classify strings as numeric.

utf8_is_printable(strings, /, *[, memory_pool])

Classify strings as printable.

utf8_is_space(strings, /, *[, memory_pool])

Classify strings as whitespace.

utf8_is_upper(strings, /, *[, memory_pool])

Classify strings as uppercase.

The second set of functions also consider the order of characters in the string element.

ascii_is_title(strings, /, *[, memory_pool])

Classify strings as ASCII titlecase.

utf8_is_title(strings, /, *[, memory_pool])

Classify strings as titlecase.

The third set of functions examines string elements on a byte-by-byte basis.

string_is_ascii(strings, /, *[, memory_pool])

Classify strings as ASCII.

String Transforms#

ascii_capitalize(strings, /, *[, memory_pool])

Capitalize the first character of ASCII input.

ascii_lower(strings, /, *[, memory_pool])

Transform ASCII input to lowercase.

ascii_reverse(strings, /, *[, memory_pool])

Reverse ASCII input.

ascii_swapcase(strings, /, *[, memory_pool])

Transform ASCII input by inverting casing.

ascii_title(strings, /, *[, memory_pool])

Titlecase each word of ASCII input.

ascii_upper(strings, /, *[, memory_pool])

Transform ASCII input to uppercase.

binary_length(strings, /, *[, memory_pool])

Compute string lengths.

binary_repeat(strings, num_repeats, /, *[, ...])

Repeat a binary string.

binary_replace_slice(strings, /, start, ...)

Replace a slice of a binary string.

binary_reverse(strings, /, *[, memory_pool])

Reverse binary input.

replace_substring(strings, /, pattern, ...)

Replace matching non-overlapping substrings with replacement.

replace_substring_regex(strings, /, pattern, ...)

Replace matching non-overlapping substrings with replacement.

utf8_capitalize(strings, /, *[, memory_pool])

Capitalize the first character of input.

utf8_length(strings, /, *[, memory_pool])

Compute UTF8 string lengths.

utf8_lower(strings, /, *[, memory_pool])

Transform input to lowercase.

utf8_replace_slice(strings, /, start, stop, ...)

Replace a slice of a string.

utf8_reverse(strings, /, *[, memory_pool])

Reverse input.

utf8_swapcase(strings, /, *[, memory_pool])

Transform input lowercase characters to uppercase and uppercase characters to lowercase.

utf8_title(strings, /, *[, memory_pool])

Titlecase each word of input.

utf8_upper(strings, /, *[, memory_pool])

Transform input to uppercase.

String Padding#

ascii_center(strings, /, width[, padding, ...])

Center strings by padding with a given character.

ascii_lpad(strings, /, width[, padding, ...])

Right-align strings by padding with a given character.

ascii_rpad(strings, /, width[, padding, ...])

Left-align strings by padding with a given character.

utf8_center(strings, /, width[, padding, ...])

Center strings by padding with a given character.

utf8_lpad(strings, /, width[, padding, ...])

Right-align strings by padding with a given character.

utf8_rpad(strings, /, width[, padding, ...])

Left-align strings by padding with a given character.

String Trimming#

ascii_ltrim(strings, /, characters, *[, ...])

Trim leading characters.

ascii_ltrim_whitespace(strings, /, *[, ...])

Trim leading ASCII whitespace characters.

ascii_rtrim(strings, /, characters, *[, ...])

Trim trailing characters.

ascii_rtrim_whitespace(strings, /, *[, ...])

Trim trailing ASCII whitespace characters.

ascii_trim(strings, /, characters, *[, ...])

Trim leading and trailing characters.

ascii_trim_whitespace(strings, /, *[, ...])

Trim leading and trailing ASCII whitespace characters.

utf8_ltrim(strings, /, characters, *[, ...])

Trim leading characters.

utf8_ltrim_whitespace(strings, /, *[, ...])

Trim leading whitespace characters.

utf8_rtrim(strings, /, characters, *[, ...])

Trim trailing characters.

utf8_rtrim_whitespace(strings, /, *[, ...])

Trim trailing whitespace characters.

utf8_trim(strings, /, characters, *[, ...])

Trim leading and trailing characters.

utf8_trim_whitespace(strings, /, *[, ...])

Trim leading and trailing whitespace characters.

String Splitting#

ascii_split_whitespace(strings, /, *[, ...])

Split string according to any ASCII whitespace.

split_pattern(strings, /, pattern, *[, ...])

Split string according to separator.

split_pattern_regex(strings, /, pattern, *)

Split string according to regex pattern.

utf8_split_whitespace(strings, /, *[, ...])

Split string according to any Unicode whitespace.

String Component Extraction#

extract_regex(strings, /, pattern, *[, ...])

Extract substrings captured by a regex pattern.

String Joining#

binary_join(strings, separator, /, *[, ...])

Join a list of strings together with a separator.

binary_join_element_wise(*strings[, ...])

Join string arguments together, with the last argument as separator.

String Slicing#

utf8_slice_codeunits(strings, /, start[, ...])

Slice string.

Containment Tests#

count_substring(strings, /, pattern, *[, ...])

Count occurrences of substring.

count_substring_regex(strings, /, pattern, *)

Count occurrences of substring.

ends_with(strings, /, pattern, *[, ...])

Check if strings end with a literal pattern.

find_substring(strings, /, pattern, *[, ...])

Find first occurrence of substring.

find_substring_regex(strings, /, pattern, *)

Find location of first match of regex pattern.

index_in(values, /, value_set, *[, ...])

Return index of each element in a set of values.

is_in(values, /, value_set, *[, skip_nulls, ...])

Find each element in a set of values.

match_like(strings, /, pattern, *[, ...])

Match strings against SQL-style LIKE pattern.

match_substring(strings, /, pattern, *[, ...])

Match strings against literal pattern.

match_substring_regex(strings, /, pattern, *)

Match strings against regex pattern.

starts_with(strings, /, pattern, *[, ...])

Check if strings start with a literal pattern.

indices_nonzero(values, /, *[, memory_pool])

Return the indices of the values in the array that are non-zero.

Categorizations#

is_finite(values, /, *[, memory_pool])

Return true if value is finite.

is_inf(values, /, *[, memory_pool])

Return true if infinity.

is_nan(values, /, *[, memory_pool])

Return true if NaN.

is_null(values, /, *[, nan_is_null, ...])

Return true if null (and optionally NaN).

is_valid(values, /, *[, memory_pool])

Return true if non-null.

true_unless_null(values, /, *[, memory_pool])

Return true if non-null, else return null.

Selecting / Multiplexing#

case_when(cond, /, *cases[, memory_pool])

Choose values based on multiple conditions.

choose(indices, /, *values[, memory_pool])

Choose values from several arrays.

coalesce(*values[, memory_pool])

Select the first non-null value.

if_else(cond, left, right, /, *[, memory_pool])

Choose values based on a condition.

Conversions#

cast(arr, target_type[, safe])

Cast array values to another data type.

ceil_temporal(timestamps, /[, multiple, ...])

Round temporal values up to nearest multiple of specified time unit.

floor_temporal(timestamps, /[, multiple, ...])

Round temporal values down to nearest multiple of specified time unit.

round_temporal(timestamps, /[, multiple, ...])

Round temporal values to the nearest multiple of specified time unit.

strftime(timestamps, /[, format, locale, ...])

Format temporal values according to a format string.

strptime(strings, /, format, unit[, ...])

Parse timestamps.

Temporal Component Extraction#

day(values, /, *[, memory_pool])

Extract day number.

day_of_week(values, /, *[, count_from_zero, ...])

Extract day of the week number.

day_of_year(values, /, *[, memory_pool])

Extract day of year number.

hour(values, /, *[, memory_pool])

Extract hour value.

iso_week(values, /, *[, memory_pool])

Extract ISO week of year number.

iso_year(values, /, *[, memory_pool])

Extract ISO year number.

iso_calendar(values, /, *[, memory_pool])

Extract (ISO year, ISO week, ISO day of week) struct.

is_leap_year(values, /, *[, memory_pool])

Extract if year is a leap year.

microsecond(values, /, *[, memory_pool])

Extract microsecond values.

millisecond(values, /, *[, memory_pool])

Extract millisecond values.

minute(values, /, *[, memory_pool])

Extract minute values.

month(values, /, *[, memory_pool])

Extract month number.

nanosecond(values, /, *[, memory_pool])

Extract nanosecond values.

quarter(values, /, *[, memory_pool])

Extract quarter of year number.

second(values, /, *[, memory_pool])

Extract second values.

subsecond(values, /, *[, memory_pool])

Extract subsecond values.

us_week(values, /, *[, memory_pool])

Extract US week of year number.

us_year(values, /, *[, memory_pool])

Extract US epidemiological year number.

week(values, /, *[, week_starts_monday, ...])

Extract week of year number.

year(values, /, *[, memory_pool])

Extract year number.

year_month_day(values, /, *[, memory_pool])

Extract (year, month, day) struct.

Temporal Difference#

day_time_interval_between(start, end, /, *)

Compute the number of days and milliseconds between two timestamps.

days_between(start, end, /, *[, memory_pool])

Compute the number of days between two timestamps.

hours_between(start, end, /, *[, memory_pool])

Compute the number of hours between two timestamps.

microseconds_between(start, end, /, *[, ...])

Compute the number of microseconds between two timestamps.

milliseconds_between(start, end, /, *[, ...])

Compute the number of millisecond boundaries between two timestamps.

minutes_between(start, end, /, *[, memory_pool])

Compute the number of minute boundaries between two timestamps.

month_day_nano_interval_between(start, end, /, *)

Compute the number of months, days and nanoseconds between two timestamps.

month_interval_between(start, end, /, *[, ...])

Compute the number of months between two timestamps.

nanoseconds_between(start, end, /, *[, ...])

Compute the number of nanoseconds between two timestamps.

quarters_between(start, end, /, *[, memory_pool])

Compute the number of quarters between two timestamps.

seconds_between(start, end, /, *[, memory_pool])

Compute the number of seconds between two timestamps.

weeks_between(start, end, /, *[, ...])

Compute the number of weeks between two timestamps.

years_between(start, end, /, *[, memory_pool])

Compute the number of years between two timestamps.

Timezone Handling#

assume_timezone(timestamps, /, timezone, *)

Convert naive timestamp to timezone-aware timestamp.

Associative Transforms#

dictionary_encode(array, /[, null_encoding, ...])

Dictionary-encode array.

unique(array, /, *[, memory_pool])

Compute unique elements.

value_counts(array, /, *[, memory_pool])

Compute counts of unique elements.

Selections#

array_filter(array, selection_filter, /[, ...])

Filter with a boolean selection filter.

array_take(array, indices, /, *[, ...])

Select values from an array based on indices from another array.

drop_null(input, /, *[, memory_pool])

Drop nulls from the input.

filter(input, selection_filter, /[, ...])

Filter with a boolean selection filter.

take(data, indices, *[, boundscheck, ...])

Select values (or records) from array- or table-like data given integer selection indices.

Sorts and Partitions#

array_sort_indices(array, /[, order, ...])

Return the indices that would sort an array.

partition_nth_indices(array, /, pivot, *[, ...])

Return the indices that would partition an array around a pivot.

select_k_unstable(input, /, k, sort_keys, *)

Select the indices of the first k ordered elements from the input.

sort_indices(input, /[, sort_keys, ...])

Return the indices that would sort an array, record batch or table.

Structural Transforms#

fill_null_backward(values, /, *[, memory_pool])

Carry non-null values backward to fill null slots.

fill_null_forward(values, /, *[, memory_pool])

Carry non-null values forward to fill null slots.

list_element(lists, index, /, *[, memory_pool])

Compute elements using of nested list values using an index.

list_flatten(lists, /, *[, memory_pool])

Flatten list values.

list_parent_indices(lists, /, *[, memory_pool])

Compute parent indices of nested list values.

list_value_length(lists, /, *[, memory_pool])

Compute list lengths.

make_struct(*args[, field_names, ...])

Wrap Arrays into a StructArray.

map_lookup(container, /, query_key, ...[, ...])

Find the items corresponding to a given key in a Map.

replace_with_mask(values, mask, ...[, ...])

Replace items selected with a mask.

struct_field(values, /, indices, *[, ...])

Extract children of a struct or union by index.

Compute Options#

ArraySortOptions([order, null_placement])

Options for the array_sort_indices function.

AssumeTimezoneOptions(timezone, *[, ...])

Options for the assume_timezone function.

CastOptions([target_type, ...])

Options for the cast function.

CountOptions([mode])

Options for the count function.

CountOptions([mode])

Options for the count function.

DayOfWeekOptions(*[, count_from_zero, ...])

Options for the day_of_week function.

DictionaryEncodeOptions([null_encoding])

Options for dictionary encoding.

ElementWiseAggregateOptions(*[, skip_nulls])

Options for element-wise aggregate functions.

ExtractRegexOptions(pattern)

Options for the extract_regex function.

FilterOptions([null_selection_behavior])

Options for selecting with a boolean filter.

IndexOptions(value)

Options for the index function.

JoinOptions([null_handling, null_replacement])

Options for the binary_join_element_wise function.

MakeStructOptions([field_names, ...])

Options for the make_struct function.

MapLookupOptions(query_key, occurrence)

Options for the map_lookup function.

MatchSubstringOptions(pattern, *[, ignore_case])

Options for looking for a substring.

ModeOptions([n, skip_nulls, min_count])

Options for the mode function.

NullOptions(*[, nan_is_null])

Options for the is_null function.

PadOptions(width[, padding])

Options for padding strings.

PartitionNthOptions(pivot, *[, null_placement])

Options for the partition_nth_indices function.

QuantileOptions([q, interpolation, ...])

Options for the quantile function.

ReplaceSliceOptions(start, stop, replacement)

Options for replacing slices.

ReplaceSubstringOptions(pattern, replacement, *)

Options for replacing matched substrings.

RoundOptions([ndigits, round_mode])

Options for rounding numbers.

RoundTemporalOptions([multiple, unit, ...])

Options for rounding temporal values.

RoundToMultipleOptions([multiple, round_mode])

Options for rounding numbers to a multiple.

ScalarAggregateOptions(*[, skip_nulls, ...])

Options for scalar aggregations.

ScalarAggregateOptions(*[, skip_nulls, ...])

Options for scalar aggregations.

SelectKOptions(k, sort_keys)

Options for top/bottom k-selection.

SetLookupOptions(value_set, *[, skip_nulls])

Options for the is_in and index_in functions.

SliceOptions(start[, stop, step])

Options for slicing.

SortOptions([sort_keys, null_placement])

Options for the sort_indices function.

SplitOptions(*[, max_splits, reverse])

Options for splitting on whitespace.

SplitPatternOptions(pattern, *[, ...])

Options for splitting on a string pattern.

StrftimeOptions([format, locale])

Options for the strftime function.

StrptimeOptions(format, unit[, error_is_null])

Options for the strptime function.

StructFieldOptions(indices)

Options for the struct_field function.

TakeOptions(*[, boundscheck])

Options for the take and array_take functions.

TDigestOptions([q, delta, buffer_size, ...])

Options for the tdigest function.

TDigestOptions([q, delta, buffer_size, ...])

Options for the tdigest function.

TrimOptions(characters)

Options for trimming characters from strings.

VarianceOptions(*[, ddof, skip_nulls, min_count])

Options for the variance and stddev functions.

WeekOptions(*[, week_starts_monday, ...])

Options for the week function.