Compute Functions¶
Datum class¶
-
class arrow::Datum¶
Variant type for various Arrow C++ data structures.
Public Functions
-
ValueDescr descr() const¶
Return the shape (array or scalar) and type for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).
Debug asserts otherwise
-
ValueDescr::Shape shape() const¶
Return the shape (array or scalar) for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).
Debug asserts otherwise
-
std::shared_ptr<DataType> type() const¶
The value type of the variant, if any.
- Returns
nullptr if no type
-
std::shared_ptr<Schema> schema() const¶
The schema of the variant, if any.
- Returns
nullptr if no schema
-
int64_t length() const¶
The value length of the variant, if any.
- Returns
kUnknownLength if no type
-
ArrayVector chunks() const¶
The array chunks of the variant, if any.
- Returns
empty if not arraylike
-
struct Empty¶
-
ValueDescr descr() const¶
Abstract Function classes¶
-
struct FunctionOptions¶
- #include <arrow/compute/function.h>
Base class for specifying options configuring a function’s behavior, such as error handling.
Subclassed by arrow::compute::ArithmeticOptions, arrow::compute::ArraySortOptions, arrow::compute::CastOptions, arrow::compute::CompareOptions, arrow::compute::CountOptions, arrow::compute::DictionaryEncodeOptions, arrow::compute::ExtractRegexOptions, arrow::compute::FilterOptions, arrow::compute::MatchSubstringOptions, arrow::compute::MinMaxOptions, arrow::compute::ModeOptions, arrow::compute::PartitionNthOptions, arrow::compute::ProjectOptions, arrow::compute::QuantileOptions, arrow::compute::ReplaceSubstringOptions, arrow::compute::SetLookupOptions, arrow::compute::SortOptions, arrow::compute::SplitOptions, arrow::compute::StrptimeOptions, arrow::compute::TakeOptions, arrow::compute::TDigestOptions, arrow::compute::TrimOptions, arrow::compute::VarianceOptions
-
struct arrow::compute::Arity¶
- #include <arrow/compute/function.h>
Contains the number of required arguments for the function.
Naming conventions taken from https://en.wikipedia.org/wiki/Arity.
Public Members
-
int num_args¶
The number of required arguments (or the minimum number for varargs functions).
-
bool is_varargs = false¶
If true, then the num_args is the minimum number of required arguments.
Public Static Functions
-
int num_args¶
-
struct arrow::compute::FunctionDoc¶
- #include <arrow/compute/function.h>
Public Members
-
std::string summary¶
A one-line summary of the function, using a verb.
For example, “Add two numeric arrays or scalars”.
-
std::string description¶
A detailed description of the function, meant to follow the summary.
-
std::vector<std::string> arg_names¶
Symbolic names (identifiers) for the function arguments.
Some bindings may use this to generate nicer function signatures.
-
std::string options_class¶
Name of the options class, if any.
-
std::string summary¶
-
class arrow::compute::Function¶
- #include <arrow/compute/function.h>
Base class for compute functions.
Function implementations contain a collection of “kernels” which are implementations of the function for specific argument types. Selecting a viable kernel for executing a function is referred to as “dispatching”.
Subclassed by arrow::compute::detail::FunctionImpl< KernelType >, arrow::compute::MetaFunction, arrow::compute::detail::FunctionImpl< HashAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarKernel >, arrow::compute::detail::FunctionImpl< VectorKernel >
Public Types
-
enum Kind¶
The kind of function, which indicates in what contexts it is valid for use.
Values:
-
enumerator SCALAR¶
A function that performs scalar data operations on whole arrays of data.
Can generally process Array or Scalar values. The size of the output will be the same as the size (or broadcasted size, in the case of mixing Array and Scalar inputs) of the input.
-
enumerator VECTOR¶
A function with array input and output whose behavior depends on the values of the entire arrays passed, rather than the value of each scalar value.
-
enumerator SCALAR_AGGREGATE¶
A function that computes scalar summary statistics from array input.
-
enumerator HASH_AGGREGATE¶
A function that computes grouped summary statistics from array input and an array of group identifiers.
-
enumerator META¶
A function that dispatches to other functions and does not contain its own kernels.
-
enumerator SCALAR¶
Public Functions
-
inline const std::string &name() const¶
The name of the kernel. The registry enforces uniqueness of names.
-
inline Function::Kind kind() const¶
The kind of kernel, which indicates in what contexts it is valid for use.
-
inline const Arity &arity() const¶
Contains the number of arguments the function requires, or if the function accepts variable numbers of arguments.
-
inline const FunctionDoc &doc() const¶
Return the function documentation.
-
virtual int num_kernels() const = 0¶
Returns the number of registered kernels for this function.
-
virtual Result<const Kernel*> DispatchExact(const std::vector<ValueDescr> &values) const¶
Return a kernel that can execute the function given the exact argument types (without implicit type casts or scalar->array promotions).
NB: This function is overridden in CastFunction.
-
virtual Result<const Kernel*> DispatchBest(std::vector<ValueDescr> *values) const¶
Return a best-match kernel that can execute the function given the argument types, after implicit casts are applied.
- Parameters
values – [inout] Argument types. An element may be modified to indicate that the returned kernel only approximately matches the input value descriptors; callers are responsible for casting inputs to the type and shape required by the kernel.
-
virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const¶
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
inline const FunctionOptions *default_options() const¶
Returns a the default options for this function.
Whatever option semantics a Function has, implementations must guarantee that default_options() is valid to pass to Execute as options.
-
enum Kind¶
-
class arrow::compute::ScalarFunction : public arrow::compute::detail::FunctionImpl<ScalarKernel>¶
- #include <arrow/compute/function.h>
A function that executes elementwise operations on arrays or scalars, and therefore whose results generally do not depend on the order of the values in the arguments.
Accepts and returns arrays that are all of the same size. These functions roughly correspond to the functions used in SQL expressions.
Subclassed by arrow::compute::CastFunction
Public Functions
-
Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)¶
Add a kernel with given input/output types, no required state initialization, preallocation for fixed-width types, and default null handling (intersect validity bitmaps of inputs).
-
Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)¶
-
class arrow::compute::VectorFunction : public arrow::compute::detail::FunctionImpl<VectorKernel>¶
- #include <arrow/compute/function.h>
A function that executes general array operations that may yield outputs of different sizes or have results that depend on the whole array contents.
These functions roughly correspond to the functions found in non-SQL array languages like APL and its derivatives.
-
class arrow::compute::ScalarAggregateFunction : public arrow::compute::detail::FunctionImpl<ScalarAggregateKernel>¶
- #include <arrow/compute/function.h>
-
class arrow::compute::HashAggregateFunction : public arrow::compute::detail::FunctionImpl<HashAggregateKernel>¶
- #include <arrow/compute/function.h>
-
class arrow::compute::MetaFunction : public arrow::compute::Function¶
- #include <arrow/compute/function.h>
A function that dispatches to other functions.
Must implement MetaFunction::ExecuteImpl.
For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution of concrete Function types, but must handle other Datum kinds on its own.
Public Functions
-
inline virtual int num_kernels() const override¶
Returns the number of registered kernels for this function.
-
virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const override¶
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
inline virtual int num_kernels() const override¶
Function registry¶
-
class arrow::compute::FunctionRegistry¶
A mutable central function registry for built-in functions as well as user-defined functions.
Functions are implementations of arrow::compute::Function.
Generally, each function contains kernels which are implementations of a function for a specific argument signature. After looking up a function in the registry, one can either execute it eagerly with Function::Execute or use one of the function’s dispatch methods to pick a suitable kernel for lower-level function execution.
Public Functions
Add a new function to the registry.
Returns Status::KeyError if a function with the same name is already registered
-
Status AddAlias(const std::string &target_name, const std::string &source_name)¶
Add aliases for the given function name.
Returns Status::KeyError if the function with the given name is not registered
-
Result<std::shared_ptr<Function>> GetFunction(const std::string &name) const¶
Retrieve a function by name from the registry.
-
std::vector<std::string> GetFunctionNames() const¶
Return vector of all entry names in the registry.
Helpful for displaying a manifest of available functions
-
int num_functions() const¶
The number of currently registered functions.
Public Static Functions
-
static std::unique_ptr<FunctionRegistry> Make()¶
Construct a new registry.
Most users only need to use the global registry
-
FunctionRegistry *arrow::compute::GetFunctionRegistry()¶
Return the process-global function registry.
Convenience functions¶
-
Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx = NULLPTR)¶
One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration of ChunkedArray inputs, and wrapping of outputs.
-
Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, ExecContext *ctx = NULLPTR)¶
Variant of CallFunction which uses a function’s default options.
NB: Some functions require FunctionOptions be provided.
Concrete options classes¶
-
enum CompareOperator¶
Values:
-
enumerator EQUAL¶
-
enumerator NOT_EQUAL¶
-
enumerator GREATER¶
-
enumerator GREATER_EQUAL¶
-
enumerator LESS¶
-
enumerator LESS_EQUAL¶
-
enumerator EQUAL¶
-
struct arrow::compute::CountOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Count kernel behavior.
By default, all non-null values are counted.
Public Types
Public Static Functions
-
static inline CountOptions Defaults()¶
-
static inline CountOptions Defaults()¶
-
struct arrow::compute::MinMaxOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control MinMax kernel behavior.
By default, null values are ignored
Public Types
Public Static Functions
-
static inline MinMaxOptions Defaults()¶
-
static inline MinMaxOptions Defaults()¶
-
struct arrow::compute::ModeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Mode kernel behavior.
Returns top-n common values and counts. By default, returns the most common value and count.
Public Functions
-
inline explicit ModeOptions(int64_t n = 1)¶
Public Members
-
int64_t n = 1¶
Public Static Functions
-
static inline ModeOptions Defaults()¶
-
inline explicit ModeOptions(int64_t n = 1)¶
-
struct arrow::compute::VarianceOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Delta Degrees of Freedom (ddof) of Variance and Stddev kernel.
The divisor used in calculations is N - ddof, where N is the number of elements. By default, ddof is zero, and population variance or stddev is returned.
Public Functions
-
inline explicit VarianceOptions(int ddof = 0)¶
Public Members
-
int ddof = 0¶
Public Static Functions
-
static inline VarianceOptions Defaults()¶
-
inline explicit VarianceOptions(int ddof = 0)¶
-
struct arrow::compute::QuantileOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Quantile kernel behavior.
By default, returns the median value.
Public Types
Public Functions
-
inline explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR)¶
-
inline explicit QuantileOptions(std::vector<double> q, enum Interpolation interpolation = LINEAR)¶
Public Members
-
std::vector<double> q¶
quantile must be between 0 and 1 inclusive
-
enum Interpolation interpolation¶
Public Static Functions
-
static inline QuantileOptions Defaults()¶
-
inline explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR)¶
-
struct arrow::compute::TDigestOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control TDigest approximate quantile kernel behavior.
By default, returns the median value.
Public Functions
-
inline explicit TDigestOptions(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500)¶
-
inline explicit TDigestOptions(std::vector<double> q, uint32_t delta = 100, uint32_t buffer_size = 500)¶
Public Members
-
std::vector<double> q¶
quantile must be between 0 and 1 inclusive
-
uint32_t delta¶
compression parameter, default 100
-
uint32_t buffer_size¶
input buffer size, default 500
Public Static Functions
-
static inline TDigestOptions Defaults()¶
-
inline explicit TDigestOptions(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500)¶
-
struct arrow::compute::ArithmeticOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline ArithmeticOptions()¶
Public Members
-
bool check_overflow¶
-
inline ArithmeticOptions()¶
-
struct arrow::compute::MatchSubstringOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit MatchSubstringOptions(std::string pattern)¶
Public Members
-
std::string pattern¶
The exact substring (or regex, depending on kernel) to look for inside input values.
-
inline explicit MatchSubstringOptions(std::string pattern)¶
-
struct arrow::compute::SplitOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Subclassed by arrow::compute::SplitPatternOptions
Public Functions
-
inline explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)¶
-
inline explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)¶
-
struct arrow::compute::SplitPatternOptions : public arrow::compute::SplitOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1, bool reverse = false)¶
Public Members
-
std::string pattern¶
The exact substring to look for inside input values.
-
inline explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1, bool reverse = false)¶
-
struct arrow::compute::ReplaceSubstringOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit ReplaceSubstringOptions(std::string pattern, std::string replacement, int64_t max_replacements = -1)¶
-
inline explicit ReplaceSubstringOptions(std::string pattern, std::string replacement, int64_t max_replacements = -1)¶
-
struct arrow::compute::ExtractRegexOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit ExtractRegexOptions(std::string pattern)¶
Public Members
-
std::string pattern¶
Regular expression with named capture fields.
-
inline explicit ExtractRegexOptions(std::string pattern)¶
-
struct arrow::compute::SetLookupOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Options for IsIn and IndexIn functions.
Public Members
-
bool skip_nulls¶
Whether nulls in
value_set
count for lookup.If true, any null in
value_set
is ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output. If false, any null invalue_set
is successfully matched in the input.
-
bool skip_nulls¶
-
struct arrow::compute::StrptimeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
-
struct arrow::compute::TrimOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit TrimOptions(std::string characters)¶
Public Members
-
std::string characters¶
The individual characters that can be trimmed from the string.
-
inline explicit TrimOptions(std::string characters)¶
-
struct arrow::compute::CompareOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
inline explicit CompareOptions(CompareOperator op)¶
Public Members
-
enum CompareOperator op¶
-
inline explicit CompareOptions(CompareOperator op)¶
-
struct arrow::compute::ProjectOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
-
struct arrow::compute::FilterOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Types
Public Functions
-
inline explicit FilterOptions(NullSelectionBehavior null_selection = DROP)¶
Public Members
-
NullSelectionBehavior null_selection_behavior = DROP¶
Public Static Functions
-
static inline FilterOptions Defaults()¶
-
inline explicit FilterOptions(NullSelectionBehavior null_selection = DROP)¶
-
struct arrow::compute::TakeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Functions
-
inline explicit TakeOptions(bool boundscheck = true)¶
Public Members
-
bool boundscheck = true¶
Public Static Functions
-
static inline TakeOptions BoundsCheck()¶
-
static inline TakeOptions NoBoundsCheck()¶
-
static inline TakeOptions Defaults()¶
-
inline explicit TakeOptions(bool boundscheck = true)¶
-
struct arrow::compute::DictionaryEncodeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Options for the dictionary encode function.
Public Types
Public Functions
-
inline explicit DictionaryEncodeOptions(NullEncodingBehavior null_encoding = MASK)¶
Public Members
-
NullEncodingBehavior null_encoding_behavior = MASK¶
Public Static Functions
-
static inline DictionaryEncodeOptions Defaults()¶
-
inline explicit DictionaryEncodeOptions(NullEncodingBehavior null_encoding = MASK)¶
-
struct arrow::compute::SortKey¶
- #include <arrow/compute/api_vector.h>
One sort key for PartitionNthIndices (TODO) and SortIndices.
-
struct arrow::compute::ArraySortOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Static Functions
-
static inline ArraySortOptions Defaults()¶
-
static inline ArraySortOptions Defaults()¶
-
struct arrow::compute::SortOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Static Functions
-
static inline SortOptions Defaults()¶
-
static inline SortOptions Defaults()¶
-
struct arrow::compute::PartitionNthOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Partitioning options for NthToIndices.
Public Functions
-
inline explicit PartitionNthOptions(int64_t pivot)¶
Public Members
-
int64_t pivot¶
The index into the equivalent sorted array of the partition pivot element.
-
inline explicit PartitionNthOptions(int64_t pivot)¶
-
struct arrow::compute::CastOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/cast.h>
Public Functions
-
inline explicit CastOptions(bool safe = true)¶
Public Members
-
bool allow_int_overflow¶
-
bool allow_time_truncate¶
-
bool allow_time_overflow¶
-
bool allow_decimal_truncate¶
-
bool allow_float_truncate¶
-
bool allow_invalid_utf8¶
Public Static Functions
-
inline explicit CastOptions(bool safe = true)¶