Compute Functions

Datum class

class arrow::Datum

Variant type for various Arrow C++ data structures.

Public Functions

Datum() = default

Empty datum, to be populated elsewhere.

int64_t TotalBufferSize() const

The sum of bytes in each buffer referenced by the datum Note: Scalars report a size of 0.

See

arrow::util::TotalBufferSize for caveats

inline bool is_value() const

True if Datum contains a scalar or array-like data.

ValueDescr descr() const

Return the shape (array or scalar) and type for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).

Debug asserts otherwise

ValueDescr::Shape shape() const

Return the shape (array or scalar) for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).

Debug asserts otherwise

const std::shared_ptr<DataType> &type() const

The value type of the variant, if any.

Returns

nullptr if no type

const std::shared_ptr<Schema> &schema() const

The schema of the variant, if any.

Returns

nullptr if no schema

int64_t length() const

The value length of the variant, if any.

Returns

kUnknownLength if no type

ArrayVector chunks() const

The array chunks of the variant, if any.

Returns

empty if not arraylike

struct Empty

Abstract Function classes

void PrintTo(const FunctionOptions&, std::ostream*)
class FunctionOptionsType
#include <arrow/compute/function.h>

Extension point for defining options outside libarrow (but still within this project).

class arrow::compute::FunctionOptions : public arrow::util::EqualityComparable<FunctionOptions>
#include <arrow/compute/function.h>

Base class for specifying options configuring a function’s behavior, such as error handling.

Subclassed by arrow::compute::ArithmeticOptions, arrow::compute::ArraySortOptions, arrow::compute::AssumeTimezoneOptions, arrow::compute::CastOptions, arrow::compute::CountOptions, arrow::compute::DayOfWeekOptions, arrow::compute::DictionaryEncodeOptions, arrow::compute::ElementWiseAggregateOptions, arrow::compute::ExtractRegexOptions, arrow::compute::FilterOptions, arrow::compute::IndexOptions, arrow::compute::JoinOptions, arrow::compute::MakeStructOptions, arrow::compute::MatchSubstringOptions, arrow::compute::ModeOptions, arrow::compute::NullOptions, arrow::compute::PadOptions, arrow::compute::PartitionNthOptions, arrow::compute::QuantileOptions, arrow::compute::RandomOptions, arrow::compute::ReplaceSliceOptions, arrow::compute::ReplaceSubstringOptions, arrow::compute::RoundOptions, arrow::compute::RoundTemporalOptions, arrow::compute::RoundToMultipleOptions, arrow::compute::ScalarAggregateOptions, arrow::compute::SelectKOptions, arrow::compute::SetLookupOptions, arrow::compute::SliceOptions, arrow::compute::SortOptions, arrow::compute::SplitOptions, arrow::compute::SplitPatternOptions, arrow::compute::StrftimeOptions, arrow::compute::StrptimeOptions, arrow::compute::StructFieldOptions, arrow::compute::TakeOptions, arrow::compute::TDigestOptions, arrow::compute::TrimOptions, arrow::compute::Utf8NormalizeOptions, arrow::compute::VarianceOptions, arrow::compute::WeekOptions

Public Functions

Result<std::shared_ptr<Buffer>> Serialize() const

Serialize an options struct to a buffer.

Public Static Functions

static Result<std::unique_ptr<FunctionOptions>> Deserialize(const std::string &type_name, const Buffer &buffer)

Deserialize an options struct from a buffer.

Note: this will only look for type_name in the default FunctionRegistry; to use a custom FunctionRegistry, look up the FunctionOptionsType, then call FunctionOptionsType::Deserialize().

struct arrow::compute::Arity
#include <arrow/compute/function.h>

Contains the number of required arguments for the function.

Naming conventions taken from https://en.wikipedia.org/wiki/Arity.

Public Members

int num_args

The number of required arguments (or the minimum number for varargs functions).

bool is_varargs = false

If true, then the num_args is the minimum number of required arguments.

Public Static Functions

static inline Arity Nullary()

A function taking no arguments.

static inline Arity Unary()

A function taking 1 argument.

static inline Arity Binary()

A function taking 2 arguments.

static inline Arity Ternary()

A function taking 3 arguments.

static inline Arity VarArgs(int min_args = 0)

A function taking a variable number of arguments.

Parameters

min_args[in] the minimum number of arguments required when invoking the function

struct arrow::compute::FunctionDoc
#include <arrow/compute/function.h>

Public Members

std::string summary

A one-line summary of the function, using a verb.

For example, “Add two numeric arrays or scalars”.

std::string description

A detailed description of the function, meant to follow the summary.

std::vector<std::string> arg_names

Symbolic names (identifiers) for the function arguments.

Some bindings may use this to generate nicer function signatures.

std::string options_class

Name of the options class, if any.

bool options_required

Whether options are required for function execution.

If false, then either the function does not have an options class or there is a usable default options value.

class arrow::compute::Function
#include <arrow/compute/function.h>

Base class for compute functions.

Function implementations contain a collection of “kernels” which are implementations of the function for specific argument types. Selecting a viable kernel for executing a function is referred to as “dispatching”.

Subclassed by arrow::compute::detail::FunctionImpl< KernelType >, arrow::compute::MetaFunction, arrow::compute::detail::FunctionImpl< HashAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarKernel >, arrow::compute::detail::FunctionImpl< VectorKernel >

Public Types

enum Kind

The kind of function, which indicates in what contexts it is valid for use.

Values:

enumerator SCALAR

A function that performs scalar data operations on whole arrays of data.

Can generally process Array or Scalar values. The size of the output will be the same as the size (or broadcasted size, in the case of mixing Array and Scalar inputs) of the input.

enumerator VECTOR

A function with array input and output whose behavior depends on the values of the entire arrays passed, rather than the value of each scalar value.

enumerator SCALAR_AGGREGATE

A function that computes scalar summary statistics from array input.

enumerator HASH_AGGREGATE

A function that computes grouped summary statistics from array input and an array of group identifiers.

enumerator META

A function that dispatches to other functions and does not contain its own kernels.

Public Functions

inline const std::string &name() const

The name of the kernel. The registry enforces uniqueness of names.

inline Function::Kind kind() const

The kind of kernel, which indicates in what contexts it is valid for use.

inline const Arity &arity() const

Contains the number of arguments the function requires, or if the function accepts variable numbers of arguments.

inline const FunctionDoc &doc() const

Return the function documentation.

virtual int num_kernels() const = 0

Returns the number of registered kernels for this function.

virtual Result<const Kernel*> DispatchExact(const std::vector<ValueDescr> &values) const

Return a kernel that can execute the function given the exact argument types (without implicit type casts or scalar->array promotions).

NB: This function is overridden in CastFunction.

virtual Result<const Kernel*> DispatchBest(std::vector<ValueDescr> *values) const

Return a best-match kernel that can execute the function given the argument types, after implicit casts are applied.

Parameters

values[inout] Argument types. An element may be modified to indicate that the returned kernel only approximately matches the input value descriptors; callers are responsible for casting inputs to the type and shape required by the kernel.

virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const

Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.

If the options pointer is null, then default_options() will be used.

This function can be overridden in subclasses.

inline const FunctionOptions *default_options() const

Returns the default options for this function.

Whatever option semantics a Function has, implementations must guarantee that default_options() is valid to pass to Execute as options.

class arrow::compute::ScalarFunction : public arrow::compute::detail::FunctionImpl<ScalarKernel>
#include <arrow/compute/function.h>

A function that executes elementwise operations on arrays or scalars, and therefore whose results generally do not depend on the order of the values in the arguments.

Accepts and returns arrays that are all of the same size. These functions roughly correspond to the functions used in SQL expressions.

Subclassed by arrow::compute::CastFunction

Public Functions

Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)

Add a kernel with given input/output types, no required state initialization, preallocation for fixed-width types, and default null handling (intersect validity bitmaps of inputs).

Status AddKernel(ScalarKernel kernel)

Add a kernel (function implementation).

Returns error if the kernel’s signature does not match the function’s arity.

class arrow::compute::VectorFunction : public arrow::compute::detail::FunctionImpl<VectorKernel>
#include <arrow/compute/function.h>

A function that executes general array operations that may yield outputs of different sizes or have results that depend on the whole array contents.

These functions roughly correspond to the functions found in non-SQL array languages like APL and its derivatives.

Public Functions

Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)

Add a simple kernel with given input/output types, no required state initialization, no data preallocation, and no preallocation of the validity bitmap.

Status AddKernel(VectorKernel kernel)

Add a kernel (function implementation).

Returns error if the kernel’s signature does not match the function’s arity.

class arrow::compute::ScalarAggregateFunction : public arrow::compute::detail::FunctionImpl<ScalarAggregateKernel>
#include <arrow/compute/function.h>

Public Functions

Status AddKernel(ScalarAggregateKernel kernel)

Add a kernel (function implementation).

Returns error if the kernel’s signature does not match the function’s arity.

class arrow::compute::HashAggregateFunction : public arrow::compute::detail::FunctionImpl<HashAggregateKernel>
#include <arrow/compute/function.h>

Public Functions

Status AddKernel(HashAggregateKernel kernel)

Add a kernel (function implementation).

Returns error if the kernel’s signature does not match the function’s arity.

class arrow::compute::MetaFunction : public arrow::compute::Function
#include <arrow/compute/function.h>

A function that dispatches to other functions.

Must implement MetaFunction::ExecuteImpl.

For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution of concrete Function types, but must handle other Datum kinds on its own.

Public Functions

inline virtual int num_kernels() const override

Returns the number of registered kernels for this function.

virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const override

Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.

If the options pointer is null, then default_options() will be used.

This function can be overridden in subclasses.

Function registry

class arrow::compute::FunctionRegistry

A mutable central function registry for built-in functions as well as user-defined functions.

Functions are implementations of arrow::compute::Function.

Generally, each function contains kernels which are implementations of a function for a specific argument signature. After looking up a function in the registry, one can either execute it eagerly with Function::Execute or use one of the function’s dispatch methods to pick a suitable kernel for lower-level function execution.

Public Functions

Status AddFunction(std::shared_ptr<Function> function, bool allow_overwrite = false)

Add a new function to the registry.

Returns Status::KeyError if a function with the same name is already registered

Status AddAlias(const std::string &target_name, const std::string &source_name)

Add aliases for the given function name.

Returns Status::KeyError if the function with the given name is not registered

Status AddFunctionOptionsType(const FunctionOptionsType *options_type, bool allow_overwrite = false)

Add a new function options type to the registry.

Returns Status::KeyError if a function options type with the same name is already registered

Result<std::shared_ptr<Function>> GetFunction(const std::string &name) const

Retrieve a function by name from the registry.

std::vector<std::string> GetFunctionNames() const

Return vector of all entry names in the registry.

Helpful for displaying a manifest of available functions

Result<const FunctionOptionsType*> GetFunctionOptionsType(const std::string &name) const

Retrieve a function options type by name from the registry.

int num_functions() const

The number of currently registered functions.

Public Static Functions

static std::unique_ptr<FunctionRegistry> Make()

Construct a new registry.

Most users only need to use the global registry

FunctionRegistry *arrow::compute::GetFunctionRegistry()

Return the process-global function registry.

Convenience functions

Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx = NULLPTR)

One-shot invoker for all types of functions.

Does kernel dispatch, argument checking, iteration of ChunkedArray inputs, and wrapping of outputs.

Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, ExecContext *ctx = NULLPTR)

Variant of CallFunction which uses a function’s default options.

NB: Some functions require FunctionOptions be provided.

Concrete options classes

enum RoundMode

Rounding and tie-breaking modes for round compute functions.

Additional details and examples are provided in compute.rst.

Values:

enumerator DOWN

Round to nearest integer less than or equal in magnitude (aka “floor”)

enumerator UP

Round to nearest integer greater than or equal in magnitude (aka “ceil”)

enumerator TOWARDS_ZERO

Get the integral part without fractional digits (aka “trunc”)

enumerator TOWARDS_INFINITY

Round negative values with DOWN rule and positive values with UP rule (aka “away from zero”)

enumerator HALF_DOWN

Round ties with DOWN rule (also called “round half towards negative infinity”)

enumerator HALF_UP

Round ties with UP rule (also called “round half towards positive infinity”)

enumerator HALF_TOWARDS_ZERO

Round ties with TOWARDS_ZERO rule (also called “round half away from infinity”)

enumerator HALF_TOWARDS_INFINITY

Round ties with TOWARDS_INFINITY rule (also called “round half away from zero”)

enumerator HALF_TO_EVEN

Round ties to nearest even integer.

enumerator HALF_TO_ODD

Round ties to nearest odd integer.

enum CalendarUnit

Values:

enumerator NANOSECOND
enumerator MICROSECOND
enumerator MILLISECOND
enumerator SECOND
enumerator MINUTE
enumerator HOUR
enumerator DAY
enumerator WEEK
enumerator MONTH
enumerator QUARTER
enumerator YEAR
enum CompareOperator

Values:

enumerator EQUAL
enumerator NOT_EQUAL
enumerator GREATER
enumerator GREATER_EQUAL
enumerator LESS
enumerator LESS_EQUAL
enum SortOrder

Values:

enumerator Ascending

Arrange values in increasing order.

enumerator Descending

Arrange values in decreasing order.

enum NullPlacement

Values:

enumerator AtStart

Place nulls and NaNs before any non-null values.

NaNs will come after nulls.

enumerator AtEnd

Place nulls and NaNs after any non-null values.

NaNs will come before nulls.

class arrow::compute::ScalarAggregateOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control general scalar aggregate kernel behavior.

By default, null values are ignored (skip_nulls = true).

Public Functions

explicit ScalarAggregateOptions(bool skip_nulls = true, uint32_t min_count = 1)

Public Members

bool skip_nulls

If true (the default), null values are ignored.

Otherwise, if any value is null, emit null.

uint32_t min_count

If less than this many non-null values are observed, emit null.

Public Static Functions

static inline ScalarAggregateOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "ScalarAggregateOptions"
class arrow::compute::CountOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control count aggregate kernel behavior.

By default, only non-null values are counted.

Public Types

enum CountMode

Values:

enumerator ONLY_VALID

Count only non-null values.

enumerator ONLY_NULL

Count only null values.

enumerator ALL

Count both non-null and null values.

Public Functions

explicit CountOptions(CountMode mode = CountMode::ONLY_VALID)

Public Members

CountMode mode

Public Static Functions

static inline CountOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "CountOptions"
class arrow::compute::ModeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control Mode kernel behavior.

Returns top-n common values and counts. By default, returns the most common value and count.

Public Functions

explicit ModeOptions(int64_t n = 1, bool skip_nulls = true, uint32_t min_count = 0)

Public Members

int64_t n = 1
bool skip_nulls

If true (the default), null values are ignored.

Otherwise, if any value is null, emit null.

uint32_t min_count

If less than this many non-null values are observed, emit null.

Public Static Functions

static inline ModeOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "ModeOptions"
class arrow::compute::VarianceOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control Delta Degrees of Freedom (ddof) of Variance and Stddev kernel.

The divisor used in calculations is N - ddof, where N is the number of elements. By default, ddof is zero, and population variance or stddev is returned.

Public Functions

explicit VarianceOptions(int ddof = 0, bool skip_nulls = true, uint32_t min_count = 0)

Public Members

int ddof = 0
bool skip_nulls

If true (the default), null values are ignored.

Otherwise, if any value is null, emit null.

uint32_t min_count

If less than this many non-null values are observed, emit null.

Public Static Functions

static inline VarianceOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "VarianceOptions"
class arrow::compute::QuantileOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control Quantile kernel behavior.

By default, returns the median value.

Public Types

enum Interpolation

Interpolation method to use when quantile lies between two data points.

Values:

enumerator LINEAR
enumerator LOWER
enumerator HIGHER
enumerator NEAREST
enumerator MIDPOINT

Public Functions

explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR, bool skip_nulls = true, uint32_t min_count = 0)
explicit QuantileOptions(std::vector<double> q, enum Interpolation interpolation = LINEAR, bool skip_nulls = true, uint32_t min_count = 0)

Public Members

std::vector<double> q

quantile must be between 0 and 1 inclusive

enum Interpolation interpolation
bool skip_nulls

If true (the default), null values are ignored.

Otherwise, if any value is null, emit null.

uint32_t min_count

If less than this many non-null values are observed, emit null.

Public Static Functions

static inline QuantileOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "QuantileOptions"
class arrow::compute::TDigestOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control TDigest approximate quantile kernel behavior.

By default, returns the median value.

Public Functions

explicit TDigestOptions(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500, bool skip_nulls = true, uint32_t min_count = 0)
explicit TDigestOptions(std::vector<double> q, uint32_t delta = 100, uint32_t buffer_size = 500, bool skip_nulls = true, uint32_t min_count = 0)

Public Members

std::vector<double> q

quantile must be between 0 and 1 inclusive

uint32_t delta

compression parameter, default 100

uint32_t buffer_size

input buffer size, default 500

bool skip_nulls

If true (the default), null values are ignored.

Otherwise, if any value is null, emit null.

uint32_t min_count

If less than this many non-null values are observed, emit null.

Public Static Functions

static inline TDigestOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "TDigestOptions"
class arrow::compute::IndexOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_aggregate.h>

Control Index kernel behavior.

Public Functions

explicit IndexOptions(std::shared_ptr<Scalar> value)
IndexOptions()

Public Members

std::shared_ptr<Scalar> value

Public Static Attributes

static constexpr const char kTypeName[] = "IndexOptions"
class arrow::compute::ArithmeticOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit ArithmeticOptions(bool check_overflow = false)

Public Members

bool check_overflow

Public Static Attributes

static constexpr const char kTypeName[] = "ArithmeticOptions"
class arrow::compute::ElementWiseAggregateOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit ElementWiseAggregateOptions(bool skip_nulls = true)

Public Members

bool skip_nulls

Public Static Functions

static inline ElementWiseAggregateOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "ElementWiseAggregateOptions"
class arrow::compute::RoundOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit RoundOptions(int64_t ndigits = 0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)

Public Members

int64_t ndigits

Rounding precision (number of digits to round to)

RoundMode round_mode

Rounding and tie-breaking mode.

Public Static Functions

static inline RoundOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "RoundOptions"
class arrow::compute::RoundTemporalOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit RoundTemporalOptions(int multiple = 1, CalendarUnit unit = CalendarUnit::DAY)

Public Members

int multiple

Number of units to round to.

CalendarUnit unit

The unit used for rounding of time.

Public Static Functions

static inline RoundTemporalOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "RoundTemporalOptions"
class arrow::compute::RoundToMultipleOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit RoundToMultipleOptions(double multiple = 1.0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)
explicit RoundToMultipleOptions(std::shared_ptr<Scalar> multiple, RoundMode round_mode = RoundMode::HALF_TO_EVEN)

Public Members

std::shared_ptr<Scalar> multiple

Rounding scale (multiple to round to).

Should be a scalar of a type compatible with the argument to be rounded. For example, rounding a decimal value means a decimal multiple is required. Rounding a floating point or integer value means a floating point scalar is required.

RoundMode round_mode

Rounding and tie-breaking mode.

Public Static Functions

static inline RoundToMultipleOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "RoundToMultipleOptions"
class arrow::compute::JoinOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Options for var_args_join.

Public Types

enum NullHandlingBehavior

How to handle null values. (A null separator always results in a null output.)

Values:

enumerator EMIT_NULL

A null in any input results in a null in the output.

enumerator SKIP

Nulls in inputs are skipped.

enumerator REPLACE

Nulls in inputs are replaced with the replacement string.

Public Functions

explicit JoinOptions(NullHandlingBehavior null_handling = EMIT_NULL, std::string null_replacement = "")

Public Members

NullHandlingBehavior null_handling
std::string null_replacement

Public Static Functions

static inline JoinOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "JoinOptions"
class arrow::compute::MatchSubstringOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit MatchSubstringOptions(std::string pattern, bool ignore_case = false)
MatchSubstringOptions()

Public Members

std::string pattern

The exact substring (or regex, depending on kernel) to look for inside input values.

bool ignore_case

Whether to perform a case-insensitive match.

Public Static Attributes

static constexpr const char kTypeName[] = "MatchSubstringOptions"
class arrow::compute::SplitOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)

Public Members

int64_t max_splits

Maximum number of splits allowed, or unlimited when -1.

bool reverse

Start splitting from the end of the string (only relevant when max_splits != -1)

Public Static Attributes

static constexpr const char kTypeName[] = "SplitOptions"
class arrow::compute::SplitPatternOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1, bool reverse = false)
SplitPatternOptions()

Public Members

std::string pattern

The exact substring to split on.

int64_t max_splits

Maximum number of splits allowed, or unlimited when -1.

bool reverse

Start splitting from the end of the string (only relevant when max_splits != -1)

Public Static Attributes

static constexpr const char kTypeName[] = "SplitPatternOptions"
class arrow::compute::ReplaceSliceOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit ReplaceSliceOptions(int64_t start, int64_t stop, std::string replacement)
ReplaceSliceOptions()

Public Members

int64_t start

Index to start slicing at.

int64_t stop

Index to stop slicing at.

std::string replacement

String to replace the slice with.

Public Static Attributes

static constexpr const char kTypeName[] = "ReplaceSliceOptions"
class arrow::compute::ReplaceSubstringOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit ReplaceSubstringOptions(std::string pattern, std::string replacement, int64_t max_replacements = -1)
ReplaceSubstringOptions()

Public Members

std::string pattern

Pattern to match, literal, or regular expression depending on which kernel is used.

std::string replacement

String to replace the pattern with.

int64_t max_replacements

Max number of substrings to replace (-1 means unbounded)

Public Static Attributes

static constexpr const char kTypeName[] = "ReplaceSubstringOptions"
class arrow::compute::ExtractRegexOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit ExtractRegexOptions(std::string pattern)
ExtractRegexOptions()

Public Members

std::string pattern

Regular expression with named capture fields.

Public Static Attributes

static constexpr const char kTypeName[] = "ExtractRegexOptions"
class arrow::compute::SetLookupOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Options for IsIn and IndexIn functions.

Public Functions

explicit SetLookupOptions(Datum value_set, bool skip_nulls = false)
SetLookupOptions()

Public Members

Datum value_set

The set of values to look up input values into.

bool skip_nulls

Whether nulls in value_set count for lookup.

If true, any null in value_set is ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output. If false, any null in value_set is successfully matched in the input.

Public Static Attributes

static constexpr const char kTypeName[] = "SetLookupOptions"
class arrow::compute::StructFieldOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Options for struct_field function.

Public Functions

explicit StructFieldOptions(std::vector<int> indices)
StructFieldOptions()

Public Members

std::vector<int> indices

The child indices to extract.

For instance, to get the 2nd child of the 1st child of a struct or union, this would be {0, 1}.

Public Static Attributes

static constexpr const char kTypeName[] = "StructFieldOptions"
class arrow::compute::StrptimeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit StrptimeOptions(std::string format, TimeUnit::type unit)
StrptimeOptions()

Public Members

std::string format
TimeUnit::type unit

Public Static Attributes

static constexpr const char kTypeName[] = "StrptimeOptions"
class arrow::compute::StrftimeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit StrftimeOptions(std::string format, std::string locale = "C")
StrftimeOptions()

Public Members

std::string format

The desired format string.

std::string locale

The desired output locale string.

Public Static Attributes

static constexpr const char kTypeName[] = "StrftimeOptions"
static constexpr const char *kDefaultFormat = "%Y-%m-%dT%H:%M:%S"
class arrow::compute::PadOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit PadOptions(int64_t width, std::string padding = " ")
PadOptions()

Public Members

int64_t width

The desired string length.

std::string padding

What to pad the string with. Should be one codepoint (Unicode)/byte (ASCII).

Public Static Attributes

static constexpr const char kTypeName[] = "PadOptions"
class arrow::compute::TrimOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit TrimOptions(std::string characters)
TrimOptions()

Public Members

std::string characters

The individual characters to be trimmed from the string.

Public Static Attributes

static constexpr const char kTypeName[] = "TrimOptions"
class arrow::compute::SliceOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit SliceOptions(int64_t start, int64_t stop = std::numeric_limits<int64_t>::max(), int64_t step = 1)
SliceOptions()

Public Members

int64_t start
int64_t stop
int64_t step

Public Static Attributes

static constexpr const char kTypeName[] = "SliceOptions"
class arrow::compute::NullOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit NullOptions(bool nan_is_null = false)

Public Members

bool nan_is_null

Public Static Functions

static inline NullOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "NullOptions"
struct arrow::compute::CompareOptions
#include <arrow/compute/api_scalar.h>

Public Functions

inline explicit CompareOptions(CompareOperator op)
inline CompareOptions()

Public Members

enum CompareOperator op
class arrow::compute::MakeStructOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

MakeStructOptions(std::vector<std::string> n, std::vector<bool> r, std::vector<std::shared_ptr<const KeyValueMetadata>> m)
explicit MakeStructOptions(std::vector<std::string> n)
MakeStructOptions()

Public Members

std::vector<std::string> field_names

Names for wrapped columns.

std::vector<bool> field_nullability

Nullability bits for wrapped columns.

std::vector<std::shared_ptr<const KeyValueMetadata>> field_metadata

Metadata attached to wrapped columns.

Public Static Attributes

static constexpr const char kTypeName[] = "MakeStructOptions"
struct arrow::compute::DayOfWeekOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit DayOfWeekOptions(bool count_from_zero = true, uint32_t week_start = 1)

Public Members

bool count_from_zero

Number days from 0 if true and from 1 if false.

uint32_t week_start

What day does the week start with (Monday=1, Sunday=7).

The numbering is unaffected by the count_from_zero parameter.

Public Static Functions

static inline DayOfWeekOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "DayOfWeekOptions"
struct arrow::compute::AssumeTimezoneOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Used to control timestamp timezone conversion and handling ambiguous/nonexistent times.

Public Types

enum Ambiguous

How to interpret ambiguous local times that can be interpreted as multiple instants (normally two) due to DST shifts.

AMBIGUOUS_EARLIEST emits the earliest instant amongst possible interpretations. AMBIGUOUS_LATEST emits the latest instant amongst possible interpretations.

Values:

enumerator AMBIGUOUS_RAISE
enumerator AMBIGUOUS_EARLIEST
enumerator AMBIGUOUS_LATEST
enum Nonexistent

How to handle local times that do not exist due to DST shifts.

NONEXISTENT_EARLIEST emits the instant “just before” the DST shift instant in the given timestamp precision (for example, for a nanoseconds precision timestamp, this is one nanosecond before the DST shift instant). NONEXISTENT_LATEST emits the DST shift instant.

Values:

enumerator NONEXISTENT_RAISE
enumerator NONEXISTENT_EARLIEST
enumerator NONEXISTENT_LATEST

Public Functions

explicit AssumeTimezoneOptions(std::string timezone, Ambiguous ambiguous = AMBIGUOUS_RAISE, Nonexistent nonexistent = NONEXISTENT_RAISE)
AssumeTimezoneOptions()

Public Members

std::string timezone

Timezone to convert timestamps from.

Ambiguous ambiguous

How to interpret ambiguous local times (due to DST shifts)

Nonexistent nonexistent

How to interpret non-existent local times (due to DST shifts)

Public Static Attributes

static constexpr const char kTypeName[] = "AssumeTimezoneOptions"
struct arrow::compute::WeekOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Functions

explicit WeekOptions(bool week_starts_monday = true, bool count_from_zero = false, bool first_week_is_fully_in_year = false)

Public Members

bool week_starts_monday

What day does the week start with (Monday=true, Sunday=false)

bool count_from_zero

Dates from current year that fall into last ISO week of the previous year return 0 if true and 52 or 53 if false.

bool first_week_is_fully_in_year

Must the first week be fully in January (true), or is a week that begins on December 29, 30, or 31 considered to be the first week of the new year (false)?

Public Static Functions

static inline WeekOptions Defaults()
static inline WeekOptions ISODefaults()
static inline WeekOptions USDefaults()

Public Static Attributes

static constexpr const char kTypeName[] = "WeekOptions"
struct arrow::compute::Utf8NormalizeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Types

enum Form

Values:

enumerator NFC
enumerator NFKC
enumerator NFD
enumerator NFKD

Public Functions

explicit Utf8NormalizeOptions(Form form = NFC)

Public Members

Form form

The Unicode normalization form to apply.

Public Static Functions

static inline Utf8NormalizeOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "Utf8NormalizeOptions"
class arrow::compute::RandomOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_scalar.h>

Public Types

enum Initializer

Values:

enumerator SystemRandom
enumerator Seed

Public Functions

RandomOptions(int64_t length, Initializer initializer, uint64_t seed)
RandomOptions()

Public Members

int64_t length

The length of the array returned. Negative is invalid.

Initializer initializer

The type of initialization for random number generation - system or provided seed.

uint64_t seed

The seed value used to initialize the random number generation.

Public Static Functions

static inline RandomOptions FromSystemRandom(int64_t length)
static inline RandomOptions FromSeed(int64_t length, uint64_t seed)
static inline RandomOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "RandomOptions"
class arrow::compute::FilterOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Public Types

enum NullSelectionBehavior

Configure the action taken when a slot of the selection mask is null.

Values:

enumerator DROP

The corresponding filtered value will be removed in the output.

enumerator EMIT_NULL

The corresponding filtered value will be null in the output.

Public Functions

explicit FilterOptions(NullSelectionBehavior null_selection = DROP)

Public Members

NullSelectionBehavior null_selection_behavior = DROP

Public Static Functions

static inline FilterOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "FilterOptions"
class arrow::compute::TakeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Public Functions

explicit TakeOptions(bool boundscheck = true)

Public Members

bool boundscheck = true

Public Static Functions

static inline TakeOptions BoundsCheck()
static inline TakeOptions NoBoundsCheck()
static inline TakeOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "TakeOptions"
class arrow::compute::DictionaryEncodeOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Options for the dictionary encode function.

Public Types

enum NullEncodingBehavior

Configure how null values will be encoded.

Values:

enumerator ENCODE

The null value will be added to the dictionary with a proper index.

enumerator MASK

The null value will be masked in the indices array.

Public Functions

explicit DictionaryEncodeOptions(NullEncodingBehavior null_encoding = MASK)

Public Members

NullEncodingBehavior null_encoding_behavior = MASK

Public Static Functions

static inline DictionaryEncodeOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "DictionaryEncodeOptions"
class arrow::compute::SortKey : public arrow::util::EqualityComparable<SortKey>
#include <arrow/compute/api_vector.h>

One sort key for PartitionNthIndices (TODO) and SortIndices.

Public Functions

inline explicit SortKey(FieldRef target, SortOrder order = SortOrder::Ascending)
bool Equals(const SortKey &other) const
std::string ToString() const

Public Members

FieldRef target

A FieldRef targetting the sort column.

SortOrder order

How to order by this sort key.

class arrow::compute::ArraySortOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Public Functions

explicit ArraySortOptions(SortOrder order = SortOrder::Ascending, NullPlacement null_placement = NullPlacement::AtEnd)

Public Members

SortOrder order

Sorting order.

NullPlacement null_placement

Whether nulls and NaNs are placed at the start or at the end.

Public Static Functions

static inline ArraySortOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "ArraySortOptions"
class arrow::compute::SortOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Public Functions

explicit SortOptions(std::vector<SortKey> sort_keys = {}, NullPlacement null_placement = NullPlacement::AtEnd)

Public Members

std::vector<SortKey> sort_keys

Column key(s) to order by and how to order by these sort keys.

NullPlacement null_placement

Whether nulls and NaNs are placed at the start or at the end.

Public Static Functions

static inline SortOptions Defaults()

Public Static Attributes

static constexpr const char kTypeName[] = "SortOptions"
class arrow::compute::SelectKOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

SelectK options.

Public Functions

explicit SelectKOptions(int64_t k = -1, std::vector<SortKey> sort_keys = {})

Public Members

int64_t k

The number of k elements to keep.

std::vector<SortKey> sort_keys

Column key(s) to order by and how to order by these sort keys.

Public Static Functions

static inline SelectKOptions Defaults()
static inline SelectKOptions TopKDefault(int64_t k, std::vector<std::string> key_names = {})
static inline SelectKOptions BottomKDefault(int64_t k, std::vector<std::string> key_names = {})

Public Static Attributes

static constexpr const char kTypeName[] = "SelectKOptions"
class arrow::compute::PartitionNthOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/api_vector.h>

Partitioning options for NthToIndices.

Public Functions

explicit PartitionNthOptions(int64_t pivot, NullPlacement null_placement = NullPlacement::AtEnd)
inline PartitionNthOptions()

Public Members

int64_t pivot

The index into the equivalent sorted array of the partition pivot element.

NullPlacement null_placement

Whether nulls and NaNs are partitioned at the start or at the end.

Public Static Attributes

static constexpr const char kTypeName[] = "PartitionNthOptions"
class arrow::compute::CastOptions : public arrow::compute::FunctionOptions
#include <arrow/compute/cast.h>

Public Functions

explicit CastOptions(bool safe = true)

Public Members

std::shared_ptr<DataType> to_type
bool allow_int_overflow
bool allow_time_truncate
bool allow_time_overflow
bool allow_decimal_truncate
bool allow_float_truncate
bool allow_invalid_utf8

Public Static Functions

static inline CastOptions Safe(std::shared_ptr<DataType> to_type = NULLPTR)
static inline CastOptions Unsafe(std::shared_ptr<DataType> to_type = NULLPTR)

Public Static Attributes

static constexpr const char kTypeName[] = "CastOptions"

Streaming Execution

Streaming Execution Operators

enum arrow::compute::JoinType

Values:

enumerator LEFT_SEMI
enumerator RIGHT_SEMI
enumerator LEFT_ANTI
enumerator RIGHT_ANTI
enumerator INNER
enumerator LEFT_OUTER
enumerator RIGHT_OUTER
enumerator FULL_OUTER
enum arrow::compute::JoinKeyCmp

Values:

enumerator EQ
enumerator IS
class arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Subclassed by arrow::compute::AggregateNodeOptions, arrow::compute::CatalogSourceNodeOptions, arrow::compute::ConsumingSinkNodeOptions, arrow::compute::FilterNodeOptions, arrow::compute::HashJoinNodeOptions, arrow::compute::ProjectNodeOptions, arrow::compute::SinkNodeOptions, arrow::compute::SourceNodeOptions, arrow::dataset::ScanNodeOptions, arrow::dataset::WriteNodeOptions

Public Functions

virtual ~ExecNodeOptions() = default
class arrow::compute::SourceNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Adapt an AsyncGenerator<ExecBatch> as a source node.

plan->exec_context()->executor() will be used to parallelize pushing to outputs, if provided.

Public Functions

inline SourceNodeOptions(std::shared_ptr<Schema> output_schema, std::function<Future<util::optional<ExecBatch>>()> generator)

Public Members

std::shared_ptr<Schema> output_schema
std::function< Future< util::optional< ExecBatch > >)> generator
class arrow::compute::FilterNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which excludes some rows from batches passed through it.

filter_expression will be evaluated against each batch which is pushed to this node. Any rows for which filter_expression does not evaluate to true will be excluded in the batch emitted by this node.

Public Functions

inline explicit FilterNodeOptions(Expression filter_expression, bool async_mode = true)

Public Members

Expression filter_expression
bool async_mode
class arrow::compute::ProjectNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which executes expressions on input batches, producing new batches.

Each expression will be evaluated against each batch which is pushed to this node to produce a corresponding output column.

If names are not provided, the string representations of exprs will be used.

Public Functions

inline explicit ProjectNodeOptions(std::vector<Expression> expressions, std::vector<std::string> names = {}, bool async_mode = true)

Public Members

std::vector<Expression> expressions
std::vector<std::string> names
bool async_mode
class arrow::compute::AggregateNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which aggregates input batches, optionally grouped by keys.

Public Functions

inline AggregateNodeOptions(std::vector<internal::Aggregate> aggregates, std::vector<FieldRef> targets, std::vector<std::string> names, std::vector<FieldRef> keys = {})

Public Members

std::vector<internal::Aggregate> aggregates
std::vector<FieldRef> targets
std::vector<std::string> names
std::vector<FieldRef> keys
class arrow::compute::SinkNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Add a sink node which forwards to an AsyncGenerator<ExecBatch>

Emitted batches will not be ordered.

Subclassed by arrow::compute::OrderBySinkNodeOptions, arrow::compute::SelectKSinkNodeOptions

Public Functions

inline explicit SinkNodeOptions(std::function<Future<util::optional<ExecBatch>>()> *generator, util::BackpressureOptions backpressure = {})

Public Members

std::function< Future< util::optional< ExecBatch > >)> * generator
util::BackpressureOptions backpressure
class arrow::compute::SinkNodeConsumer
#include <arrow/compute/exec/options.h>

Public Functions

virtual ~SinkNodeConsumer() = default
virtual Status Consume(ExecBatch batch) = 0

Consume a batch of data.

virtual Future Finish() = 0

Signal to the consumer that the last batch has been delivered.

The returned future should only finish when all outstanding tasks have completed

class arrow::compute::ConsumingSinkNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Add a sink node which consumes data within the exec plan run.

Public Functions

inline explicit ConsumingSinkNodeOptions(std::shared_ptr<SinkNodeConsumer> consumer)

Public Members

std::shared_ptr<SinkNodeConsumer> consumer
class arrow::compute::OrderBySinkNodeOptions : public arrow::compute::SinkNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which sorts rows passed through it.

All batches pushed to this node will be accumulated, then sorted, by the given fields. Then sorted batches will be forwarded to the generator in sorted order.

Public Functions

inline explicit OrderBySinkNodeOptions(SortOptions sort_options, std::function<Future<util::optional<ExecBatch>>()> *generator)

Public Members

SortOptions sort_options
class arrow::compute::HashJoinNodeOptions : public arrow::compute::ExecNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which implements join operation using hash join strategy.

Public Functions

inline HashJoinNodeOptions(JoinType in_join_type, std::vector<FieldRef> in_left_keys, std::vector<FieldRef> in_right_keys, Expression filter = literal(true), std::string output_prefix_for_left = default_output_prefix_for_left, std::string output_prefix_for_right = default_output_prefix_for_right)
inline HashJoinNodeOptions(JoinType join_type, std::vector<FieldRef> left_keys, std::vector<FieldRef> right_keys, std::vector<FieldRef> left_output, std::vector<FieldRef> right_output, Expression filter = literal(true), std::string output_prefix_for_left = default_output_prefix_for_left, std::string output_prefix_for_right = default_output_prefix_for_right)
inline HashJoinNodeOptions(JoinType join_type, std::vector<FieldRef> left_keys, std::vector<FieldRef> right_keys, std::vector<FieldRef> left_output, std::vector<FieldRef> right_output, std::vector<JoinKeyCmp> key_cmp, Expression filter = literal(true), std::string output_prefix_for_left = default_output_prefix_for_left, std::string output_prefix_for_right = default_output_prefix_for_right)

Public Members

JoinType join_type
std::vector<FieldRef> left_keys
std::vector<FieldRef> right_keys
bool output_all
std::vector<FieldRef> left_output
std::vector<FieldRef> right_output
std::vector<JoinKeyCmp> key_cmp
std::string output_prefix_for_left
std::string output_prefix_for_right
Expression filter

Public Static Attributes

static constexpr const char *default_output_prefix_for_left = ""
static constexpr const char *default_output_prefix_for_right = ""
class arrow::compute::SelectKSinkNodeOptions : public arrow::compute::SinkNodeOptions
#include <arrow/compute/exec/options.h>

Make a node which select top_k/bottom_k rows passed through it.

All batches pushed to this node will be accumulated, then selected, by the given fields. Then sorted batches will be forwarded to the generator in sorted order.

Public Functions

inline explicit SelectKSinkNodeOptions(SelectKOptions select_k_options, std::function<Future<util::optional<ExecBatch>>()> *generator)

Public Members

SelectKOptions select_k_options

SelectK options.

Execution Plan Expressions

class arrow::FieldRef

Descriptor of a (potentially nested) field within a schema.

Unlike FieldPath (which exclusively uses indices of child fields), FieldRef may reference a field by name. It is intended to replace parameters like int field_index and const std::string& field_name; it can be implicitly constructed from either a field index or a name.

Nested fields can be referenced as well. Given schema({field(“a”, struct_({field(“n”, null())})), field(“b”, int32())})

the following all indicate the nested field named “n”: FieldRef ref1(0, 0); FieldRef ref2(“a”, 0); FieldRef ref3(“a”, “n”); FieldRef ref4(0, “n”); ARROW_ASSIGN_OR_RAISE(FieldRef ref5, FieldRef::FromDotPath(“.a[0]”));

FieldPaths matching a FieldRef are retrieved using the member function FindAll. Multiple matches are possible because field names may be duplicated within a schema. For example: Schema a_is_ambiguous({field(“a”, int32()), field(“a”, float32())}); auto matches = FieldRef(“a”).FindAll(a_is_ambiguous); assert(matches.size() == 2); assert(matches[0].Get(a_is_ambiguous)->Equals(a_is_ambiguous.field(0))); assert(matches[1].Get(a_is_ambiguous)->Equals(a_is_ambiguous.field(1)));

Convenience accessors are available which raise a helpful error if the field is not found or ambiguous, and for immediately calling FieldPath::Get to retrieve any matching children: auto maybe_match = FieldRef(“struct”, “field_i32”).FindOneOrNone(schema); auto maybe_column = FieldRef(“struct”, “field_i32”).GetOne(some_table);

Public Functions

FieldRef(FieldPath indices)

Construct a FieldRef using a string of indices.

The reference will be retrieved as: schema.fields[self.indices[0]].type.fields[self.indices[1]] …

Empty indices are not valid.

inline FieldRef(std::string name)

Construct a by-name FieldRef.

Multiple fields may match a by-name FieldRef: [f for f in schema.fields where f.name == self.name]

inline FieldRef(int index)

Equivalent to a single index string of indices.

template<typename A0, typename A1, typename ...A>
inline FieldRef(A0 &&a0, A1 &&a1, A&&... a)

Convenience constructor for nested FieldRefs: each argument will be used to construct a FieldRef.

std::vector<FieldPath> FindAll(const Schema &schema) const

Retrieve FieldPath of every child field which matches this FieldRef.

std::vector<FieldPath> FindAll(const ArrayData &array) const

Convenience function which applies FindAll to arg’s type or schema.

template<typename T>
inline Status CheckNonEmpty(const std::vector<FieldPath> &matches, const T &root) const

Convenience function: raise an error if matches is empty.

template<typename T>
inline Status CheckNonMultiple(const std::vector<FieldPath> &matches, const T &root) const

Convenience function: raise an error if matches contains multiple FieldPaths.

template<typename T>
inline Result<FieldPath> FindOne(const T &root) const

Retrieve FieldPath of a single child field which matches this FieldRef.

Emit an error if none or multiple match.

template<typename T>
inline Result<FieldPath> FindOneOrNone(const T &root) const

Retrieve FieldPath of a single child field which matches this FieldRef.

Emit an error if multiple match. An empty (invalid) FieldPath will be returned if none match.

template<typename T>
inline std::vector<GetType<T>> GetAll(const T &root) const

Get all children matching this FieldRef.

template<typename T>
inline Result<GetType<T>> GetOne(const T &root) const

Get the single child matching this FieldRef.

Emit an error if none or multiple match.

template<typename T>
inline Result<GetType<T>> GetOneOrNone(const T &root) const

Get the single child matching this FieldRef.

Return nullptr if none match, emit an error if multiple match.

Public Static Functions

static Result<FieldRef> FromDotPath(const std::string &dot_path)

Parse a dot path into a FieldRef.

dot_path = ‘.’ name | ‘[‘ digit+ ‘]’ | dot_path+

Examples: “.alpha” => FieldRef(“alpha”) “[2]” => FieldRef(2) “.beta[3]” => FieldRef(“beta”, 3) “[5].gamma.delta[7]” => FieldRef(5, “gamma”, “delta”, 7) “.hello world” => FieldRef(“hello world”) R”(.\[y\]\tho.\)” => FieldRef(R”([y]\tho.\)”)

Note: When parsing a name, a ‘' preceding any other character will be dropped from the resulting name. Therefore if a name must contain the characters ‘.’, ‘', or ‘[‘ those must be escaped with a preceding ‘'.

struct Hash
inline bool operator==(const Expression &l, const Expression &r)
inline bool operator!=(const Expression &l, const Expression &r)
Expression literal(Datum lit)
template<typename Arg>
Expression literal(Arg &&arg)
Expression field_ref(FieldRef ref)
Expression call(std::string function, std::vector<Expression> arguments, std::shared_ptr<FunctionOptions> options = NULLPTR)
template<typename Options, typename = typename std::enable_if<std::is_base_of<FunctionOptions, Options>::value>::type>
Expression call(std::string function, std::vector<Expression> arguments, Options options)
std::vector<FieldRef> FieldsInExpression(const Expression&)

Assemble a list of all fields referenced by an Expression at any depth.

bool ExpressionHasFieldRefs(const Expression&)

Check if the expression references any fields.

Result<KnownFieldValues> ExtractKnownFieldValues(const Expression &guaranteed_true_predicate)
class arrow::compute::Expression
#include <arrow/compute/exec/expression.h>

An unbound expression which maps a single Datum to another Datum.

An expression is one of

  • A literal Datum.

  • A reference to a single (potentially nested) field of the input Datum.

  • A call to a compute function, with arguments specified by other Expressions.

Public Functions

Result<Expression> Bind(const ValueDescr &in, ExecContext* = NULLPTR) const

Bind this expression to the given input type, looking up Kernels and field types.

Some expression simplification may be performed and implicit casts will be inserted. Any state necessary for execution will be initialized and returned.

bool IsBound() const

Return true if all an expression’s field references have explicit ValueDescr and all of its functions’ kernels are looked up.

bool IsScalarExpression() const

Return true if this expression is composed only of Scalar literals, field references, and calls to ScalarFunctions.

bool IsNullLiteral() const

Return true if this expression is literal and entirely null.

bool IsSatisfiable() const

Return true if this expression could evaluate to true.

const Call *call() const

Access a Call or return nullptr if this expression is not a call.

const Datum *literal() const

Access a Datum or return nullptr if this expression is not a literal.

const FieldRef *field_ref() const

Access a FieldRef or return nullptr if this expression is not a field_ref.

ValueDescr descr() const

The type and shape to which this expression will evaluate.

struct Call
#include <arrow/compute/exec/expression.h>
struct Hash
#include <arrow/compute/exec/expression.h>
struct Parameter
#include <arrow/compute/exec/expression.h>
Expression project(std::vector<Expression> values, std::vector<std::string> names)
Expression equal(Expression lhs, Expression rhs)
Expression not_equal(Expression lhs, Expression rhs)
Expression less(Expression lhs, Expression rhs)
Expression less_equal(Expression lhs, Expression rhs)
Expression greater(Expression lhs, Expression rhs)
Expression greater_equal(Expression lhs, Expression rhs)
Expression is_null(Expression lhs, bool nan_is_null = false)
Expression is_valid(Expression lhs)
Expression and_(Expression lhs, Expression rhs)
Expression and_(const std::vector<Expression>&)
Expression or_(Expression lhs, Expression rhs)
Expression or_(const std::vector<Expression>&)
Expression not_(Expression operand)
Result<Expression> Canonicalize(Expression, ExecContext* = NULLPTR)

Weak canonicalization which establishes guarantees for subsequent passes.

Even equivalent Expressions may result in different canonicalized expressions. TODO this could be a strong canonicalization

Result<Expression> FoldConstants(Expression)

Simplify Expressions based on literal arguments (for example, add(null, x) will always be null so replace the call with a null literal).

Includes early evaluation of all calls whose arguments are entirely literal.

Result<Expression> ReplaceFieldsWithKnownValues(const KnownFieldValues &known_values, Expression)

Simplify Expressions by replacing with known values of the fields which it references.

Result<Expression> SimplifyWithGuarantee(Expression, const Expression &guaranteed_true_predicate)

Simplify an expression by replacing subexpressions based on a guarantee: a boolean expression which is guaranteed to evaluate to true.

For example, this is used to remove redundant function calls from a filter expression or to replace a reference to a constant-value field with a literal.