Compute Functions¶
Datum class¶
-
class Datum¶
Variant type for various Arrow C++ data structures.
Public Functions
-
int64_t TotalBufferSize() const¶
The sum of bytes in each buffer referenced by the datum Note: Scalars report a size of 0.
See also
arrow::util::TotalBufferSize for caveats
-
const std::shared_ptr<DataType> &type() const¶
The value type of the variant, if any.
- Returns:
nullptr if no type
-
const std::shared_ptr<Schema> &schema() const¶
The schema of the variant, if any.
- Returns:
nullptr if no schema
-
int64_t length() const¶
The value length of the variant, if any.
- Returns:
kUnknownLength if no type
-
ArrayVector chunks() const¶
The array chunks of the variant, if any.
- Returns:
empty if not arraylike
-
struct Empty¶
-
int64_t TotalBufferSize() const¶
Abstract Function classes¶
-
void PrintTo(const FunctionOptions&, std::ostream*)¶
-
class FunctionOptionsType¶
- #include <arrow/compute/function.h>
Extension point for defining options outside libarrow (but still within this project).
-
class FunctionOptions : public arrow::util::EqualityComparable<FunctionOptions>¶
- #include <arrow/compute/function.h>
Base class for specifying options configuring a function’s behavior, such as error handling.
Subclassed by arrow::compute::ArithmeticOptions, arrow::compute::ArraySortOptions, arrow::compute::AssumeTimezoneOptions, arrow::compute::CastOptions, arrow::compute::CountOptions, arrow::compute::CumulativeSumOptions, arrow::compute::DayOfWeekOptions, arrow::compute::DictionaryEncodeOptions, arrow::compute::ElementWiseAggregateOptions, arrow::compute::ExtractRegexOptions, arrow::compute::FilterOptions, arrow::compute::IndexOptions, arrow::compute::JoinOptions, arrow::compute::ListSliceOptions, arrow::compute::MakeStructOptions, arrow::compute::MapLookupOptions, arrow::compute::MatchSubstringOptions, arrow::compute::ModeOptions, arrow::compute::NullOptions, arrow::compute::PadOptions, arrow::compute::PartitionNthOptions, arrow::compute::QuantileOptions, arrow::compute::RandomOptions, arrow::compute::RankOptions, arrow::compute::ReplaceSliceOptions, arrow::compute::ReplaceSubstringOptions, arrow::compute::RoundOptions, arrow::compute::RoundTemporalOptions, arrow::compute::RoundToMultipleOptions, arrow::compute::ScalarAggregateOptions, arrow::compute::SelectKOptions, arrow::compute::SetLookupOptions, arrow::compute::SliceOptions, arrow::compute::SortOptions, arrow::compute::SplitOptions, arrow::compute::SplitPatternOptions, arrow::compute::StrftimeOptions, arrow::compute::StrptimeOptions, arrow::compute::StructFieldOptions, arrow::compute::TakeOptions, arrow::compute::TDigestOptions, arrow::compute::TrimOptions, arrow::compute::Utf8NormalizeOptions, arrow::compute::VarianceOptions, arrow::compute::WeekOptions
Public Functions
Public Static Functions
-
static Result<std::unique_ptr<FunctionOptions>> Deserialize(const std::string &type_name, const Buffer &buffer)¶
Deserialize an options struct from a buffer.
Note: this will only look for
type_name
in the default FunctionRegistry; to use a custom FunctionRegistry, look up the FunctionOptionsType, then call FunctionOptionsType::Deserialize().
-
static Result<std::unique_ptr<FunctionOptions>> Deserialize(const std::string &type_name, const Buffer &buffer)¶
-
struct Arity¶
- #include <arrow/compute/function.h>
Contains the number of required arguments for the function.
Naming conventions taken from https://en.wikipedia.org/wiki/Arity.
Public Members
-
int num_args¶
The number of required arguments (or the minimum number for varargs functions).
-
bool is_varargs = false¶
If true, then the num_args is the minimum number of required arguments.
Public Static Functions
-
int num_args¶
-
struct FunctionDoc¶
- #include <arrow/compute/function.h>
Public Members
-
std::string summary¶
A one-line summary of the function, using a verb.
For example, “Add two numeric arrays or scalars”.
-
std::string description¶
A detailed description of the function, meant to follow the summary.
-
std::vector<std::string> arg_names¶
Symbolic names (identifiers) for the function arguments.
Some bindings may use this to generate nicer function signatures.
-
std::string options_class¶
Name of the options class, if any.
-
bool options_required¶
Whether options are required for function execution.
If false, then either the function does not have an options class or there is a usable default options value.
-
std::string summary¶
-
class FunctionExecutor¶
- #include <arrow/compute/function.h>
An executor of a function with a preconfigured kernel.
Public Functions
-
virtual Status Init(const FunctionOptions *options = NULLPTR, ExecContext *exec_ctx = NULLPTR) = 0¶
Initialize or re-initialize the preconfigured kernel.
This method may be called zero or more times. Depending on how the FunctionExecutor was obtained, it may already have been initialized.
-
virtual Result<Datum> Execute(const std::vector<Datum> &args, int64_t length = -1) = 0¶
Execute the preconfigured kernel with arguments that must fit it.
The method requires the arguments be castable to the preconfigured types.
- Parameters:
args – [in] Arguments to execute the function on
length – [in] Length of arguments batch or -1 to default it. If the function has no parameters, this determines the batch length, defaulting to 0. Otherwise, if the function is scalar, this must equal the argument batch’s inferred length or be -1 to default to it. This is ignored for vector functions.
-
virtual Status Init(const FunctionOptions *options = NULLPTR, ExecContext *exec_ctx = NULLPTR) = 0¶
-
class Function¶
- #include <arrow/compute/function.h>
Base class for compute functions.
Function implementations contain a collection of “kernels” which are implementations of the function for specific argument types. Selecting a viable kernel for executing a function is referred to as “dispatching”.
Subclassed by arrow::compute::detail::FunctionImpl< KernelType >, arrow::compute::MetaFunction, arrow::compute::detail::FunctionImpl< HashAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarKernel >, arrow::compute::detail::FunctionImpl< VectorKernel >
Public Types
-
enum Kind¶
The kind of function, which indicates in what contexts it is valid for use.
Values:
-
enumerator SCALAR¶
A function that performs scalar data operations on whole arrays of data.
Can generally process Array or Scalar values. The size of the output will be the same as the size (or broadcasted size, in the case of mixing Array and Scalar inputs) of the input.
-
enumerator VECTOR¶
A function with array input and output whose behavior depends on the values of the entire arrays passed, rather than the value of each scalar value.
-
enumerator SCALAR_AGGREGATE¶
A function that computes scalar summary statistics from array input.
-
enumerator HASH_AGGREGATE¶
A function that computes grouped summary statistics from array input and an array of group identifiers.
-
enumerator META¶
A function that dispatches to other functions and does not contain its own kernels.
-
enumerator SCALAR¶
Public Functions
-
inline const std::string &name() const¶
The name of the kernel. The registry enforces uniqueness of names.
-
inline Function::Kind kind() const¶
The kind of kernel, which indicates in what contexts it is valid for use.
-
inline const Arity &arity() const¶
Contains the number of arguments the function requires, or if the function accepts variable numbers of arguments.
-
inline const FunctionDoc &doc() const¶
Return the function documentation.
-
virtual int num_kernels() const = 0¶
Returns the number of registered kernels for this function.
-
virtual Result<const Kernel*> DispatchExact(const std::vector<TypeHolder> &types) const¶
Return a kernel that can execute the function given the exact argument types (without implicit type casts).
NB: This function is overridden in CastFunction.
-
virtual Result<const Kernel*> DispatchBest(std::vector<TypeHolder> *values) const¶
Return a best-match kernel that can execute the function given the argument types, after implicit casts are applied.
- Parameters:
values – [inout] Argument types. An element may be modified to indicate that the returned kernel only approximately matches the input value descriptors; callers are responsible for casting inputs to the type required by the kernel.
-
virtual Result<std::shared_ptr<FunctionExecutor>> GetBestExecutor(std::vector<TypeHolder> inputs) const¶
Get a function executor with a best-matching kernel.
The returned executor will by default work with the default FunctionOptions and KernelContext. If you want to change that, call
FunctionExecutor::Init
.
-
virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const¶
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
inline const FunctionOptions *default_options() const¶
Returns the default options for this function.
Whatever option semantics a Function has, implementations must guarantee that default_options() is valid to pass to Execute as options.
-
enum Kind¶
-
class ScalarFunction : public arrow::compute::detail::FunctionImpl<ScalarKernel>¶
- #include <arrow/compute/function.h>
A function that executes elementwise operations on arrays or scalars, and therefore whose results generally do not depend on the order of the values in the arguments.
Accepts and returns arrays that are all of the same size. These functions roughly correspond to the functions used in SQL expressions.
Public Functions
-
Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)¶
Add a kernel with given input/output types, no required state initialization, preallocation for fixed-width types, and default null handling (intersect validity bitmaps of inputs).
-
Status AddKernel(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)¶
-
class VectorFunction : public arrow::compute::detail::FunctionImpl<VectorKernel>¶
- #include <arrow/compute/function.h>
A function that executes general array operations that may yield outputs of different sizes or have results that depend on the whole array contents.
These functions roughly correspond to the functions found in non-SQL array languages like APL and its derivatives.
-
class ScalarAggregateFunction : public arrow::compute::detail::FunctionImpl<ScalarAggregateKernel>¶
- #include <arrow/compute/function.h>
-
class HashAggregateFunction : public arrow::compute::detail::FunctionImpl<HashAggregateKernel>¶
- #include <arrow/compute/function.h>
-
class MetaFunction : public arrow::compute::Function¶
- #include <arrow/compute/function.h>
A function that dispatches to other functions.
Must implement MetaFunction::ExecuteImpl.
For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution of concrete Function types, but must handle other Datum kinds on its own.
Public Functions
-
inline virtual int num_kernels() const override¶
Returns the number of registered kernels for this function.
-
virtual Result<Datum> Execute(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const override¶
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
inline virtual int num_kernels() const override¶
Function execution¶
Warning
doxygengroup: Cannot find group “compute-functions-executor” in doxygen xml output for project “arrow_cpp” from directory: ../../cpp/apidoc/xml
Function registry¶
-
class FunctionRegistry¶
A mutable central function registry for built-in functions as well as user-defined functions.
Functions are implementations of arrow::compute::Function.
Generally, each function contains kernels which are implementations of a function for a specific argument signature. After looking up a function in the registry, one can either execute it eagerly with Function::Execute or use one of the function’s dispatch methods to pick a suitable kernel for lower-level function execution.
Public Functions
Check whether a new function can be added to the registry.
- Returns:
Status::KeyError if a function with the same name is already registered.
Add a new function to the registry.
- Returns:
Status::KeyError if a function with the same name is already registered.
-
Status CanAddAlias(const std::string &target_name, const std::string &source_name)¶
Check whether an alias can be added for the given function name.
- Returns:
Status::KeyError if the function with the given name is not registered.
-
Status AddAlias(const std::string &target_name, const std::string &source_name)¶
Add alias for the given function name.
- Returns:
Status::KeyError if the function with the given name is not registered.
-
Status CanAddFunctionOptionsType(const FunctionOptionsType *options_type, bool allow_overwrite = false)¶
Check whether a new function options type can be added to the registry.
- Returns:
Status::KeyError if a function options type with the same name is already registered.
-
Status AddFunctionOptionsType(const FunctionOptionsType *options_type, bool allow_overwrite = false)¶
Add a new function options type to the registry.
- Returns:
Status::KeyError if a function options type with the same name is already registered.
-
Result<std::shared_ptr<Function>> GetFunction(const std::string &name) const¶
Retrieve a function by name from the registry.
-
std::vector<std::string> GetFunctionNames() const¶
Return vector of all entry names in the registry.
Helpful for displaying a manifest of available functions.
-
Result<const FunctionOptionsType*> GetFunctionOptionsType(const std::string &name) const¶
Retrieve a function options type by name from the registry.
-
int num_functions() const¶
The number of currently registered functions.
Public Static Functions
-
static std::unique_ptr<FunctionRegistry> Make()¶
Construct a new registry.
Most users only need to use the global registry.
-
static std::unique_ptr<FunctionRegistry> Make(FunctionRegistry *parent)¶
Construct a new nested registry with the given parent.
Most users only need to use the global registry. The returned registry never changes its parent, even when an operation allows overwritting.
-
FunctionRegistry *arrow::compute::GetFunctionRegistry()¶
Return the process-global function registry.
Convenience functions¶
-
Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx = NULLPTR)¶
One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration of ChunkedArray inputs, and wrapping of outputs.
-
Result<Datum> CallFunction(const std::string &func_name, const std::vector<Datum> &args, ExecContext *ctx = NULLPTR)¶
Variant of CallFunction which uses a function’s default options.
NB: Some functions require FunctionOptions be provided.
-
Result<Datum> CallFunction(const std::string &func_name, const ExecBatch &batch, const FunctionOptions *options, ExecContext *ctx = NULLPTR)¶
One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration of ChunkedArray inputs, and wrapping of outputs.
-
Result<Datum> CallFunction(const std::string &func_name, const ExecBatch &batch, ExecContext *ctx = NULLPTR)¶
Variant of CallFunction which uses a function’s default options.
NB: Some functions require FunctionOptions be provided.
Concrete options classes¶
-
enum class RoundMode : int8_t¶
Rounding and tie-breaking modes for round compute functions.
Additional details and examples are provided in compute.rst.
Values:
-
enumerator DOWN¶
Round to nearest integer less than or equal in magnitude (aka “floor”)
-
enumerator UP¶
Round to nearest integer greater than or equal in magnitude (aka “ceil”)
-
enumerator TOWARDS_ZERO¶
Get the integral part without fractional digits (aka “trunc”)
-
enumerator TOWARDS_INFINITY¶
Round negative values with DOWN rule and positive values with UP rule (aka “away from zero”)
-
enumerator HALF_DOWN¶
Round ties with DOWN rule (also called “round half towards negative infinity”)
-
enumerator HALF_UP¶
Round ties with UP rule (also called “round half towards positive infinity”)
-
enumerator HALF_TOWARDS_ZERO¶
Round ties with TOWARDS_ZERO rule (also called “round half away from infinity”)
-
enumerator HALF_TOWARDS_INFINITY¶
Round ties with TOWARDS_INFINITY rule (also called “round half away from zero”)
-
enumerator HALF_TO_EVEN¶
Round ties to nearest even integer.
-
enumerator HALF_TO_ODD¶
Round ties to nearest odd integer.
-
enumerator DOWN¶
-
enum class CalendarUnit : int8_t¶
Values:
-
enumerator NANOSECOND¶
-
enumerator MICROSECOND¶
-
enumerator MILLISECOND¶
-
enumerator SECOND¶
-
enumerator MINUTE¶
-
enumerator HOUR¶
-
enumerator DAY¶
-
enumerator WEEK¶
-
enumerator MONTH¶
-
enumerator QUARTER¶
-
enumerator YEAR¶
-
enumerator NANOSECOND¶
-
enum CompareOperator¶
Values:
-
enumerator EQUAL¶
-
enumerator NOT_EQUAL¶
-
enumerator GREATER¶
-
enumerator GREATER_EQUAL¶
-
enumerator LESS¶
-
enumerator LESS_EQUAL¶
-
enumerator EQUAL¶
-
enum class SortOrder¶
Values:
-
enumerator Ascending¶
Arrange values in increasing order.
-
enumerator Descending¶
Arrange values in decreasing order.
-
enumerator Ascending¶
-
enum class NullPlacement¶
Values:
-
enumerator AtStart¶
Place nulls and NaNs before any non-null values.
NaNs will come after nulls.
-
enumerator AtEnd¶
Place nulls and NaNs after any non-null values.
NaNs will come before nulls.
-
enumerator AtStart¶
-
class ScalarAggregateOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control general scalar aggregate kernel behavior.
By default, null values are ignored (skip_nulls = true).
Public Functions
-
explicit ScalarAggregateOptions(bool skip_nulls = true, uint32_t min_count = 1)¶
Public Members
-
bool skip_nulls¶
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
-
uint32_t min_count¶
If less than this many non-null values are observed, emit null.
Public Static Functions
-
static inline ScalarAggregateOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ScalarAggregateOptions"¶
-
explicit ScalarAggregateOptions(bool skip_nulls = true, uint32_t min_count = 1)¶
-
class CountOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control count aggregate kernel behavior.
By default, only non-null values are counted.
Public Types
Public Functions
-
explicit CountOptions(CountMode mode = CountMode::ONLY_VALID)¶
Public Static Functions
-
static inline CountOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "CountOptions"¶
-
explicit CountOptions(CountMode mode = CountMode::ONLY_VALID)¶
-
class ModeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Mode kernel behavior.
Returns top-n common values and counts. By default, returns the most common value and count.
Public Functions
-
explicit ModeOptions(int64_t n = 1, bool skip_nulls = true, uint32_t min_count = 0)¶
Public Members
-
int64_t n = 1¶
-
bool skip_nulls¶
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
-
uint32_t min_count¶
If less than this many non-null values are observed, emit null.
Public Static Functions
-
static inline ModeOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ModeOptions"¶
-
explicit ModeOptions(int64_t n = 1, bool skip_nulls = true, uint32_t min_count = 0)¶
-
class VarianceOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Delta Degrees of Freedom (ddof) of Variance and Stddev kernel.
The divisor used in calculations is N - ddof, where N is the number of elements. By default, ddof is zero, and population variance or stddev is returned.
Public Functions
-
explicit VarianceOptions(int ddof = 0, bool skip_nulls = true, uint32_t min_count = 0)¶
Public Members
-
int ddof = 0¶
-
bool skip_nulls¶
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
-
uint32_t min_count¶
If less than this many non-null values are observed, emit null.
Public Static Functions
-
static inline VarianceOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "VarianceOptions"¶
-
explicit VarianceOptions(int ddof = 0, bool skip_nulls = true, uint32_t min_count = 0)¶
-
class QuantileOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Quantile kernel behavior.
By default, returns the median value.
Public Types
Public Functions
-
explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR, bool skip_nulls = true, uint32_t min_count = 0)¶
-
explicit QuantileOptions(std::vector<double> q, enum Interpolation interpolation = LINEAR, bool skip_nulls = true, uint32_t min_count = 0)¶
Public Members
-
std::vector<double> q¶
quantile must be between 0 and 1 inclusive
-
enum Interpolation interpolation¶
-
bool skip_nulls¶
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
-
uint32_t min_count¶
If less than this many non-null values are observed, emit null.
Public Static Functions
-
static inline QuantileOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "QuantileOptions"¶
-
explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR, bool skip_nulls = true, uint32_t min_count = 0)¶
-
class TDigestOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control TDigest approximate quantile kernel behavior.
By default, returns the median value.
Public Functions
-
explicit TDigestOptions(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500, bool skip_nulls = true, uint32_t min_count = 0)¶
-
explicit TDigestOptions(std::vector<double> q, uint32_t delta = 100, uint32_t buffer_size = 500, bool skip_nulls = true, uint32_t min_count = 0)¶
Public Members
-
std::vector<double> q¶
quantile must be between 0 and 1 inclusive
-
uint32_t delta¶
compression parameter, default 100
-
uint32_t buffer_size¶
input buffer size, default 500
-
bool skip_nulls¶
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
-
uint32_t min_count¶
If less than this many non-null values are observed, emit null.
Public Static Functions
-
static inline TDigestOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "TDigestOptions"¶
-
explicit TDigestOptions(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500, bool skip_nulls = true, uint32_t min_count = 0)¶
-
class IndexOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_aggregate.h>
Control Index kernel behavior.
Public Static Attributes
-
static constexpr const char kTypeName[] = "IndexOptions"¶
-
static constexpr const char kTypeName[] = "IndexOptions"¶
-
struct Aggregate¶
- #include <arrow/compute/api_aggregate.h>
Configure a grouped aggregation.
-
class ArithmeticOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit ArithmeticOptions(bool check_overflow = false)¶
Public Members
-
bool check_overflow¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ArithmeticOptions"¶
-
explicit ArithmeticOptions(bool check_overflow = false)¶
-
class ElementWiseAggregateOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit ElementWiseAggregateOptions(bool skip_nulls = true)¶
Public Members
-
bool skip_nulls¶
Public Static Functions
-
static inline ElementWiseAggregateOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ElementWiseAggregateOptions"¶
-
explicit ElementWiseAggregateOptions(bool skip_nulls = true)¶
-
class RoundOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit RoundOptions(int64_t ndigits = 0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)¶
Public Members
-
int64_t ndigits¶
Rounding precision (number of digits to round to)
Public Static Functions
-
static inline RoundOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "RoundOptions"¶
-
explicit RoundOptions(int64_t ndigits = 0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)¶
-
class RoundTemporalOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit RoundTemporalOptions(int multiple = 1, CalendarUnit unit = CalendarUnit::DAY, bool week_starts_monday = true, bool ceil_is_strictly_greater = false, bool calendar_based_origin = false)¶
Public Members
-
int multiple¶
Number of units to round to.
-
CalendarUnit unit¶
The unit used for rounding of time.
-
bool week_starts_monday¶
What day does the week start with (Monday=true, Sunday=false)
-
bool ceil_is_strictly_greater¶
Enable this flag to return a rounded value that is strictly greater than the input.
For example: ceiling 1970-01-01T00:00:00 to 3 hours would yield 1970-01-01T03:00:00 if set to true and 1970-01-01T00:00:00 if set to false. This applies for ceiling only.
-
bool calendar_based_origin¶
By default time is rounded to a multiple of units since 1970-01-01T00:00:00.
By setting calendar_based_origin to true, time will be rounded to a number of units since the last greater calendar unit. For example: rounding to a multiple of days since the beginning of the month or to hours since the beginning of the day. Exceptions: week and quarter are not used as greater units, therefore days will will be rounded to the beginning of the month not week. Greater unit of week is year. Note that ceiling and rounding might change sorting order of an array near greater unit change. For example rounding YYYY-mm-dd 23:00:00 to 5 hours will ceil and round to YYYY-mm-dd+1 01:00:00 and floor to YYYY-mm-dd 20:00:00. On the other hand YYYY-mm-dd+1 00:00:00 will ceil, round and floor to YYYY-mm-dd+1 00:00:00. This can break the order of an already ordered array.
Public Static Functions
-
static inline RoundTemporalOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "RoundTemporalOptions"¶
-
explicit RoundTemporalOptions(int multiple = 1, CalendarUnit unit = CalendarUnit::DAY, bool week_starts_monday = true, bool ceil_is_strictly_greater = false, bool calendar_based_origin = false)¶
-
class RoundToMultipleOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit RoundToMultipleOptions(double multiple = 1.0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)¶
Public Members
Public Static Functions
-
static inline RoundToMultipleOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "RoundToMultipleOptions"¶
-
explicit RoundToMultipleOptions(double multiple = 1.0, RoundMode round_mode = RoundMode::HALF_TO_EVEN)¶
-
class JoinOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Options for var_args_join.
Public Types
-
enum NullHandlingBehavior¶
How to handle null values. (A null separator always results in a null output.)
Values:
-
enumerator EMIT_NULL¶
A null in any input results in a null in the output.
-
enumerator SKIP¶
Nulls in inputs are skipped.
-
enumerator REPLACE¶
Nulls in inputs are replaced with the replacement string.
-
enumerator EMIT_NULL¶
Public Functions
-
explicit JoinOptions(NullHandlingBehavior null_handling = EMIT_NULL, std::string null_replacement = "")¶
Public Static Functions
-
static inline JoinOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "JoinOptions"¶
-
enum NullHandlingBehavior¶
-
class MatchSubstringOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit MatchSubstringOptions(std::string pattern, bool ignore_case = false)¶
-
MatchSubstringOptions()¶
Public Members
-
std::string pattern¶
The exact substring (or regex, depending on kernel) to look for inside input values.
-
bool ignore_case¶
Whether to perform a case-insensitive match.
Public Static Attributes
-
static constexpr const char kTypeName[] = "MatchSubstringOptions"¶
-
explicit MatchSubstringOptions(std::string pattern, bool ignore_case = false)¶
-
class SplitOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)¶
Public Members
-
int64_t max_splits¶
Maximum number of splits allowed, or unlimited when -1.
-
bool reverse¶
Start splitting from the end of the string (only relevant when max_splits != -1)
Public Static Attributes
-
static constexpr const char kTypeName[] = "SplitOptions"¶
-
explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)¶
-
class SplitPatternOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1, bool reverse = false)¶
-
SplitPatternOptions()¶
Public Members
-
std::string pattern¶
The exact substring to split on.
-
int64_t max_splits¶
Maximum number of splits allowed, or unlimited when -1.
-
bool reverse¶
Start splitting from the end of the string (only relevant when max_splits != -1)
Public Static Attributes
-
static constexpr const char kTypeName[] = "SplitPatternOptions"¶
-
explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1, bool reverse = false)¶
-
class ReplaceSliceOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit ReplaceSliceOptions(int64_t start, int64_t stop, std::string replacement)¶
-
ReplaceSliceOptions()¶
Public Members
-
int64_t start¶
Index to start slicing at.
-
int64_t stop¶
Index to stop slicing at.
-
std::string replacement¶
String to replace the slice with.
Public Static Attributes
-
static constexpr const char kTypeName[] = "ReplaceSliceOptions"¶
-
explicit ReplaceSliceOptions(int64_t start, int64_t stop, std::string replacement)¶
-
class ReplaceSubstringOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit ReplaceSubstringOptions(std::string pattern, std::string replacement, int64_t max_replacements = -1)¶
-
ReplaceSubstringOptions()¶
Public Members
-
std::string pattern¶
Pattern to match, literal, or regular expression depending on which kernel is used.
-
std::string replacement¶
String to replace the pattern with.
-
int64_t max_replacements¶
Max number of substrings to replace (-1 means unbounded)
Public Static Attributes
-
static constexpr const char kTypeName[] = "ReplaceSubstringOptions"¶
-
explicit ReplaceSubstringOptions(std::string pattern, std::string replacement, int64_t max_replacements = -1)¶
-
class ExtractRegexOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
std::string pattern¶
Regular expression with named capture fields.
Public Static Attributes
-
static constexpr const char kTypeName[] = "ExtractRegexOptions"¶
-
std::string pattern¶
-
class SetLookupOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Options for IsIn and IndexIn functions.
Public Functions
-
SetLookupOptions()¶
Public Members
-
bool skip_nulls¶
Whether nulls in
value_set
count for lookup.If true, any null in
value_set
is ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output. If false, any null invalue_set
is successfully matched in the input.
Public Static Attributes
-
static constexpr const char kTypeName[] = "SetLookupOptions"¶
-
SetLookupOptions()¶
-
class StructFieldOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Options for struct_field function.
Public Functions
-
explicit StructFieldOptions(std::vector<int> indices)¶
-
explicit StructFieldOptions(std::initializer_list<int>)¶
-
StructFieldOptions()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "StructFieldOptions"¶
-
explicit StructFieldOptions(std::vector<int> indices)¶
-
class StrptimeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
StrptimeOptions()¶
Public Members
-
std::string format¶
The desired format string.
-
bool error_is_null¶
Return null on parsing errors if true or raise if false.
Public Static Attributes
-
static constexpr const char kTypeName[] = "StrptimeOptions"¶
-
StrptimeOptions()¶
-
class StrftimeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit StrftimeOptions(std::string format, std::string locale = "C")¶
-
StrftimeOptions()¶
-
explicit StrftimeOptions(std::string format, std::string locale = "C")¶
-
class PadOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
int64_t width¶
The desired string length.
-
std::string padding¶
What to pad the string with. Should be one codepoint (Unicode)/byte (ASCII).
Public Static Attributes
-
static constexpr const char kTypeName[] = "PadOptions"¶
-
int64_t width¶
-
class TrimOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
std::string characters¶
The individual characters to be trimmed from the string.
Public Static Attributes
-
static constexpr const char kTypeName[] = "TrimOptions"¶
-
std::string characters¶
-
class SliceOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit SliceOptions(int64_t start, int64_t stop = std::numeric_limits<int64_t>::max(), int64_t step = 1)¶
-
SliceOptions()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "SliceOptions"¶
-
explicit SliceOptions(int64_t start, int64_t stop = std::numeric_limits<int64_t>::max(), int64_t step = 1)¶
-
class ListSliceOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit ListSliceOptions(int64_t start, std::optional<int64_t> stop = std::nullopt, int64_t step = 1, std::optional<bool> return_fixed_size_list = std::nullopt)¶
-
ListSliceOptions()¶
Public Members
-
int64_t start¶
The start of list slicing.
-
std::optional<int64_t> stop¶
Optional stop of list slicing. If not set, then slice to end. (NotImplemented)
-
int64_t step¶
Slicing step.
-
std::optional<bool> return_fixed_size_list¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ListSliceOptions"¶
-
explicit ListSliceOptions(int64_t start, std::optional<int64_t> stop = std::nullopt, int64_t step = 1, std::optional<bool> return_fixed_size_list = std::nullopt)¶
-
class NullOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit NullOptions(bool nan_is_null = false)¶
Public Members
-
bool nan_is_null¶
Public Static Functions
-
static inline NullOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "NullOptions"¶
-
explicit NullOptions(bool nan_is_null = false)¶
-
struct CompareOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
enum CompareOperator op¶
-
enum CompareOperator op¶
-
class MakeStructOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
std::vector<std::string> field_names¶
Names for wrapped columns.
-
std::vector<bool> field_nullability¶
Nullability bits for wrapped columns.
-
std::vector<std::shared_ptr<const KeyValueMetadata>> field_metadata¶
Metadata attached to wrapped columns.
Public Static Attributes
-
static constexpr const char kTypeName[] = "MakeStructOptions"¶
-
std::vector<std::string> field_names¶
-
struct DayOfWeekOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit DayOfWeekOptions(bool count_from_zero = true, uint32_t week_start = 1)¶
Public Members
-
bool count_from_zero¶
Number days from 0 if true and from 1 if false.
-
uint32_t week_start¶
What day does the week start with (Monday=1, Sunday=7).
The numbering is unaffected by the count_from_zero parameter.
Public Static Functions
-
static inline DayOfWeekOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "DayOfWeekOptions"¶
-
explicit DayOfWeekOptions(bool count_from_zero = true, uint32_t week_start = 1)¶
-
struct AssumeTimezoneOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Used to control timestamp timezone conversion and handling ambiguous/nonexistent times.
Public Types
-
enum Ambiguous¶
How to interpret ambiguous local times that can be interpreted as multiple instants (normally two) due to DST shifts.
AMBIGUOUS_EARLIEST emits the earliest instant amongst possible interpretations. AMBIGUOUS_LATEST emits the latest instant amongst possible interpretations.
Values:
-
enumerator AMBIGUOUS_RAISE¶
-
enumerator AMBIGUOUS_EARLIEST¶
-
enumerator AMBIGUOUS_LATEST¶
-
enumerator AMBIGUOUS_RAISE¶
-
enum Nonexistent¶
How to handle local times that do not exist due to DST shifts.
NONEXISTENT_EARLIEST emits the instant “just before” the DST shift instant in the given timestamp precision (for example, for a nanoseconds precision timestamp, this is one nanosecond before the DST shift instant). NONEXISTENT_LATEST emits the DST shift instant.
Values:
-
enumerator NONEXISTENT_RAISE¶
-
enumerator NONEXISTENT_EARLIEST¶
-
enumerator NONEXISTENT_LATEST¶
-
enumerator NONEXISTENT_RAISE¶
Public Functions
-
explicit AssumeTimezoneOptions(std::string timezone, Ambiguous ambiguous = AMBIGUOUS_RAISE, Nonexistent nonexistent = NONEXISTENT_RAISE)¶
-
AssumeTimezoneOptions()¶
Public Members
-
std::string timezone¶
Timezone to convert timestamps from.
-
Nonexistent nonexistent¶
How to interpret non-existent local times (due to DST shifts)
Public Static Attributes
-
static constexpr const char kTypeName[] = "AssumeTimezoneOptions"¶
-
enum Ambiguous¶
-
struct WeekOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Functions
-
explicit WeekOptions(bool week_starts_monday = true, bool count_from_zero = false, bool first_week_is_fully_in_year = false)¶
Public Members
-
bool week_starts_monday¶
What day does the week start with (Monday=true, Sunday=false)
-
bool count_from_zero¶
Dates from current year that fall into last ISO week of the previous year return 0 if true and 52 or 53 if false.
-
bool first_week_is_fully_in_year¶
Must the first week be fully in January (true), or is a week that begins on December 29, 30, or 31 considered to be the first week of the new year (false)?
Public Static Functions
-
static inline WeekOptions Defaults()¶
-
static inline WeekOptions ISODefaults()¶
-
static inline WeekOptions USDefaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "WeekOptions"¶
-
explicit WeekOptions(bool week_starts_monday = true, bool count_from_zero = false, bool first_week_is_fully_in_year = false)¶
-
struct Utf8NormalizeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Static Functions
-
static inline Utf8NormalizeOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "Utf8NormalizeOptions"¶
-
static inline Utf8NormalizeOptions Defaults()¶
-
class RandomOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Public Members
-
Initializer initializer¶
The type of initialization for random number generation - system or provided seed.
-
uint64_t seed¶
The seed value used to initialize the random number generation.
Public Static Functions
-
static inline RandomOptions FromSystemRandom()¶
-
static inline RandomOptions FromSeed(uint64_t seed)¶
-
static inline RandomOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "RandomOptions"¶
-
Initializer initializer¶
-
class MapLookupOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_scalar.h>
Options for map_lookup function.
Public Types
Public Members
-
Occurrence occurrence¶
Whether to return the first, last, or all matching values.
Public Static Attributes
-
static constexpr const char kTypeName[] = "MapLookupOptions"¶
-
Occurrence occurrence¶
-
class FilterOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Types
Public Functions
-
explicit FilterOptions(NullSelectionBehavior null_selection = DROP)¶
Public Members
-
NullSelectionBehavior null_selection_behavior = DROP¶
Public Static Functions
-
static inline FilterOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "FilterOptions"¶
-
explicit FilterOptions(NullSelectionBehavior null_selection = DROP)¶
-
class TakeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Functions
-
explicit TakeOptions(bool boundscheck = true)¶
Public Members
-
bool boundscheck = true¶
Public Static Functions
-
static inline TakeOptions BoundsCheck()¶
-
static inline TakeOptions NoBoundsCheck()¶
-
static inline TakeOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "TakeOptions"¶
-
explicit TakeOptions(bool boundscheck = true)¶
-
class DictionaryEncodeOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Options for the dictionary encode function.
Public Types
Public Functions
-
explicit DictionaryEncodeOptions(NullEncodingBehavior null_encoding = MASK)¶
Public Members
-
NullEncodingBehavior null_encoding_behavior = MASK¶
Public Static Functions
-
static inline DictionaryEncodeOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "DictionaryEncodeOptions"¶
-
explicit DictionaryEncodeOptions(NullEncodingBehavior null_encoding = MASK)¶
-
class SortKey : public arrow::util::EqualityComparable<SortKey>¶
- #include <arrow/compute/api_vector.h>
One sort key for PartitionNthIndices (TODO) and SortIndices.
Public Functions
-
std::string ToString() const¶
-
std::string ToString() const¶
-
class ArraySortOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Functions
-
explicit ArraySortOptions(SortOrder order = SortOrder::Ascending, NullPlacement null_placement = NullPlacement::AtEnd)¶
Public Members
-
NullPlacement null_placement¶
Whether nulls and NaNs are placed at the start or at the end.
Public Static Functions
-
static inline ArraySortOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "ArraySortOptions"¶
-
explicit ArraySortOptions(SortOrder order = SortOrder::Ascending, NullPlacement null_placement = NullPlacement::AtEnd)¶
-
class SortOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Public Functions
-
explicit SortOptions(std::vector<SortKey> sort_keys = {}, NullPlacement null_placement = NullPlacement::AtEnd)¶
Public Members
-
NullPlacement null_placement¶
Whether nulls and NaNs are placed at the start or at the end.
Public Static Functions
-
static inline SortOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "SortOptions"¶
-
explicit SortOptions(std::vector<SortKey> sort_keys = {}, NullPlacement null_placement = NullPlacement::AtEnd)¶
-
class SelectKOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
SelectK options.
Public Members
-
int64_t k¶
The number of
k
elements to keep.
Public Static Functions
-
static inline SelectKOptions Defaults()¶
-
static inline SelectKOptions TopKDefault(int64_t k, std::vector<std::string> key_names = {})¶
-
static inline SelectKOptions BottomKDefault(int64_t k, std::vector<std::string> key_names = {})¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "SelectKOptions"¶
-
int64_t k¶
-
class RankOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Rank options.
Public Types
-
enum Tiebreaker¶
Configure how ties between equal values are handled.
Values:
-
enumerator Min¶
Ties get the smallest possible rank in sorted order.
-
enumerator Max¶
Ties get the largest possible rank in sorted order.
-
enumerator First¶
Ranks are assigned in order of when ties appear in the input.
This ensures the ranks are a stable permutation of the input.
-
enumerator Dense¶
The ranks span a dense [1, M] interval where M is the number of distinct values in the input.
-
enumerator Min¶
Public Functions
-
explicit RankOptions(std::vector<SortKey> sort_keys = {}, NullPlacement null_placement = NullPlacement::AtEnd, Tiebreaker tiebreaker = RankOptions::First)¶
-
inline explicit RankOptions(SortOrder order, NullPlacement null_placement = NullPlacement::AtEnd, Tiebreaker tiebreaker = RankOptions::First)¶
Convenience constructor for array inputs.
Public Members
-
NullPlacement null_placement¶
Whether nulls and NaNs are placed at the start or at the end.
-
Tiebreaker tiebreaker¶
Tiebreaker for dealing with equal values in ranks.
Public Static Functions
-
static inline RankOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "RankOptions"¶
-
enum Tiebreaker¶
-
class PartitionNthOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Partitioning options for NthToIndices.
Public Functions
-
explicit PartitionNthOptions(int64_t pivot, NullPlacement null_placement = NullPlacement::AtEnd)¶
-
inline PartitionNthOptions()¶
Public Members
-
int64_t pivot¶
The index into the equivalent sorted array of the partition pivot element.
-
NullPlacement null_placement¶
Whether nulls and NaNs are partitioned at the start or at the end.
Public Static Attributes
-
static constexpr const char kTypeName[] = "PartitionNthOptions"¶
-
explicit PartitionNthOptions(int64_t pivot, NullPlacement null_placement = NullPlacement::AtEnd)¶
-
class CumulativeSumOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/api_vector.h>
Options for cumulative sum function.
Public Functions
-
explicit CumulativeSumOptions(double start = 0, bool skip_nulls = false, bool check_overflow = false)¶
Public Members
-
bool skip_nulls = false¶
If true, nulls in the input are ignored and produce a corresponding null output.
When false, the first null encountered is propagated through the remaining output.
Public Static Functions
-
static inline CumulativeSumOptions Defaults()¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "CumulativeSumOptions"¶
-
explicit CumulativeSumOptions(double start = 0, bool skip_nulls = false, bool check_overflow = false)¶
-
class CastOptions : public arrow::compute::FunctionOptions¶
- #include <arrow/compute/cast.h>
Public Functions
-
explicit CastOptions(bool safe = true)¶
Public Members
-
TypeHolder to_type¶
-
bool allow_int_overflow¶
-
bool allow_time_truncate¶
-
bool allow_time_overflow¶
-
bool allow_decimal_truncate¶
-
bool allow_float_truncate¶
-
bool allow_invalid_utf8¶
Public Static Functions
-
static inline CastOptions Safe(TypeHolder to_type = {})¶
-
static inline CastOptions Unsafe(TypeHolder to_type = {})¶
Public Static Attributes
-
static constexpr const char kTypeName[] = "CastOptions"¶
-
explicit CastOptions(bool safe = true)¶
Streaming Execution¶
Streaming Execution Operators¶
-
enum class arrow::compute::JoinType¶
Values:
-
enumerator LEFT_SEMI¶
-
enumerator RIGHT_SEMI¶
-
enumerator LEFT_ANTI¶
-
enumerator RIGHT_ANTI¶
-
enumerator INNER¶
-
enumerator LEFT_OUTER¶
-
enumerator RIGHT_OUTER¶
-
enumerator FULL_OUTER¶
-
enumerator LEFT_SEMI¶
-
using ArrayVectorIteratorMaker = std::function<Iterator<std::shared_ptr<ArrayVector>>()>¶
-
using ExecBatchIteratorMaker = std::function<Iterator<std::shared_ptr<ExecBatch>>()>¶
-
using RecordBatchIteratorMaker = std::function<Iterator<std::shared_ptr<RecordBatch>>()>¶
-
constexpr int32_t kDefaultBackpressureHighBytes = 1 << 30¶
-
constexpr int32_t kDefaultBackpressureLowBytes = 1 << 28¶
-
class ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Subclassed by arrow::compute::AggregateNodeOptions, arrow::compute::AsofJoinNodeOptions, arrow::compute::ConsumingSinkNodeOptions, arrow::compute::FilterNodeOptions, arrow::compute::HashJoinNodeOptions, arrow::compute::NamedTableNodeOptions, arrow::compute::ProjectNodeOptions, arrow::compute::RecordBatchReaderSourceNodeOptions, arrow::compute::SchemaSourceNodeOptions< ItMaker >, arrow::compute::SinkNodeOptions, arrow::compute::SourceNodeOptions, arrow::compute::TableSinkNodeOptions, arrow::compute::TableSourceNodeOptions, arrow::dataset::ScanNodeOptions, arrow::dataset::ScanV2Options, arrow::dataset::WriteNodeOptions, arrow::compute::SchemaSourceNodeOptions< ArrayVectorIteratorMaker >, arrow::compute::SchemaSourceNodeOptions< ExecBatchIteratorMaker >, arrow::compute::SchemaSourceNodeOptions< RecordBatchIteratorMaker >
Public Functions
-
virtual ~ExecNodeOptions() = default¶
-
virtual ~ExecNodeOptions() = default¶
-
class SourceNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Adapt an AsyncGenerator<ExecBatch> as a source node.
plan->exec_context()->executor() will be used to parallelize pushing to outputs, if provided.
Public Functions
Public Members
- std::function< Future< std::optional< ExecBatch > >)> generator
Public Static Functions
-
static Result<std::shared_ptr<SourceNodeOptions>> FromTable(const Table &table, arrow::internal::Executor*)¶
-
class TableSourceNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
An extended Source node which accepts a table.
Public Functions
Public Static Attributes
-
static constexpr int64_t kDefaultMaxBatchSize = 1 << 20¶
-
static constexpr int64_t kDefaultMaxBatchSize = 1 << 20¶
-
class NamedTableNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Define a lazy resolved Arrow table.
The table uniquely identified by the names can typically be resolved at the time when the plan is to be consumed.
This node is for serialization purposes only and can never be executed.
Public Functions
-
template<typename ItMaker>
class SchemaSourceNodeOptions : public arrow::compute::ExecNodeOptions¶ - #include <arrow/compute/exec/options.h>
An extended Source node which accepts a schema.
ItMaker is a maker of an iterator of tabular data.
Public Functions
-
class RecordBatchReaderSourceNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Public Functions
Public Members
-
std::shared_ptr<RecordBatchReader> reader¶
The RecordBatchReader which acts as the data source.
-
arrow::internal::Executor *io_executor¶
The executor to use for the reader.
Defaults to the default I/O executor.
-
std::shared_ptr<RecordBatchReader> reader¶
-
class ArrayVectorSourceNodeOptions : public arrow::compute::SchemaSourceNodeOptions<ArrayVectorIteratorMaker>¶
- #include <arrow/compute/exec/options.h>
An extended Source node which accepts a schema and array-vectors.
-
class ExecBatchSourceNodeOptions : public arrow::compute::SchemaSourceNodeOptions<ExecBatchIteratorMaker>¶
- #include <arrow/compute/exec/options.h>
An extended Source node which accepts a schema and exec-batches.
-
class RecordBatchSourceNodeOptions : public arrow::compute::SchemaSourceNodeOptions<RecordBatchIteratorMaker>¶
- #include <arrow/compute/exec/options.h>
An extended Source node which accepts a schema and record-batches.
-
class FilterNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which excludes some rows from batches passed through it.
filter_expression will be evaluated against each batch which is pushed to this node. Any rows for which filter_expression does not evaluate to
true
will be excluded in the batch emitted by this node.Public Functions
-
inline explicit FilterNodeOptions(Expression filter_expression)¶
Public Members
-
Expression filter_expression¶
-
inline explicit FilterNodeOptions(Expression filter_expression)¶
-
class ProjectNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which executes expressions on input batches, producing new batches.
Each expression will be evaluated against each batch which is pushed to this node to produce a corresponding output column.
If names are not provided, the string representations of exprs will be used.
Public Functions
-
inline explicit ProjectNodeOptions(std::vector<Expression> expressions, std::vector<std::string> names = {})¶
-
inline explicit ProjectNodeOptions(std::vector<Expression> expressions, std::vector<std::string> names = {})¶
-
class AggregateNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which aggregates input batches, optionally grouped by keys.
If the keys attribute is a non-empty vector, then each aggregate in
aggregates
is expected to be a HashAggregate function. If the keys attribute is an empty vector, then each aggregate is assumed to be a ScalarAggregate function.Public Functions
-
class BackpressureMonitor¶
- #include <arrow/compute/exec/options.h>
-
struct BackpressureOptions¶
- #include <arrow/compute/exec/options.h>
Options to control backpressure behavior.
Public Functions
-
inline BackpressureOptions()¶
Create default options that perform no backpressure.
-
inline BackpressureOptions(uint64_t resume_if_below, uint64_t pause_if_above)¶
Create options that will perform backpressure.
- Parameters:
resume_if_below – The producer should resume producing if the backpressure queue has fewer than resume_if_below items.
pause_if_above – The producer should pause producing if the backpressure queue has more than pause_if_above items
-
inline bool should_apply_backpressure() const¶
Public Static Functions
-
static inline BackpressureOptions DefaultBackpressure()¶
-
inline BackpressureOptions()¶
-
class SinkNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Add a sink node which forwards to an AsyncGenerator<ExecBatch>
Emitted batches will not be ordered.
Subclassed by arrow::compute::OrderBySinkNodeOptions, arrow::compute::SelectKSinkNodeOptions
Public Functions
-
inline explicit SinkNodeOptions(std::function<Future<std::optional<ExecBatch>>()> *generator, BackpressureOptions backpressure = {}, BackpressureMonitor **backpressure_monitor = NULLPTR)¶
Public Members
- std::function< Future< std::optional< ExecBatch > >)> * generator
A pointer to a generator of batches.
This will be set when the node is added to the plan and should be used to consume data from the plan. If this function is not called frequently enough then the sink node will start to accumulate data and may apply backpressure.
-
std::shared_ptr<Schema> *schema¶
A pointer which will be set to the schema of the generated batches.
This is optional, if nullptr is passed in then it will be ignored. This will be set when the node is added to the plan, before StartProducing is called
-
BackpressureOptions backpressure¶
Options to control when to apply backpressure.
This is optional, the default is to never apply backpressure. If the plan is not consumed quickly enough the system may eventually run out of memory.
-
BackpressureMonitor **backpressure_monitor¶
A pointer to a backpressure monitor.
This will be set when the node is added to the plan. This can be used to inspect the amount of data currently queued in the sink node. This is an optional utility and backpressure can be applied even if this is not used.
-
inline explicit SinkNodeOptions(std::function<Future<std::optional<ExecBatch>>()> *generator, BackpressureOptions backpressure = {}, BackpressureMonitor **backpressure_monitor = NULLPTR)¶
-
class BackpressureControl¶
- #include <arrow/compute/exec/options.h>
Control used by a SinkNodeConsumer to pause & resume.
Callers should ensure that they do not call Pause and Resume simultaneously and they should sequence things so that a call to Pause() is always followed by an eventual call to Resume()
-
class SinkNodeConsumer¶
- #include <arrow/compute/exec/options.h>
Subclassed by arrow::compute::NullSinkNodeConsumer, arrow::compute::TableSinkNodeConsumer
Public Functions
-
virtual ~SinkNodeConsumer() = default¶
Prepare any consumer state.
This will be run once the schema is finalized as the plan is starting and before any calls to Consume. A common use is to save off the schema so that batches can be interpreted. TODO(ARROW-17837) Move ExecPlan* plan to query context
-
virtual ~SinkNodeConsumer() = default¶
-
class ConsumingSinkNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Add a sink node which consumes data within the exec plan run.
Public Functions
Public Members
-
std::shared_ptr<SinkNodeConsumer> consumer¶
-
std::vector<std::string> names¶
Names to rename the sink’s schema fields to.
If specified then names must be provided for all fields. Currently, only a flat schema is supported (see ARROW-15901).
-
std::shared_ptr<SinkNodeConsumer> consumer¶
-
class OrderBySinkNodeOptions : public arrow::compute::SinkNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which sorts rows passed through it.
All batches pushed to this node will be accumulated, then sorted, by the given fields. Then sorted batches will be forwarded to the generator in sorted order.
Public Functions
-
inline explicit OrderBySinkNodeOptions(SortOptions sort_options, std::function<Future<std::optional<ExecBatch>>()> *generator)¶
Public Members
-
SortOptions sort_options¶
-
inline explicit OrderBySinkNodeOptions(SortOptions sort_options, std::function<Future<std::optional<ExecBatch>>()> *generator)¶
-
class HashJoinNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which implements join operation using hash join strategy.
Public Functions
-
inline HashJoinNodeOptions(JoinType in_join_type, std::vector<FieldRef> in_left_keys, std::vector<FieldRef> in_right_keys, Expression filter = literal(true), std::string output_suffix_for_left = default_output_suffix_for_left, std::string output_suffix_for_right = default_output_suffix_for_right, bool disable_bloom_filter = false)¶
-
inline HashJoinNodeOptions(std::vector<FieldRef> in_left_keys, std::vector<FieldRef> in_right_keys)¶
-
inline HashJoinNodeOptions(JoinType join_type, std::vector<FieldRef> left_keys, std::vector<FieldRef> right_keys, std::vector<FieldRef> left_output, std::vector<FieldRef> right_output, Expression filter = literal(true), std::string output_suffix_for_left = default_output_suffix_for_left, std::string output_suffix_for_right = default_output_suffix_for_right, bool disable_bloom_filter = false)¶
-
inline HashJoinNodeOptions(JoinType join_type, std::vector<FieldRef> left_keys, std::vector<FieldRef> right_keys, std::vector<FieldRef> left_output, std::vector<FieldRef> right_output, std::vector<JoinKeyCmp> key_cmp, Expression filter = literal(true), std::string output_suffix_for_left = default_output_suffix_for_left, std::string output_suffix_for_right = default_output_suffix_for_right, bool disable_bloom_filter = false)¶
-
HashJoinNodeOptions() = default¶
Public Members
-
bool output_all = false¶
-
std::vector<JoinKeyCmp> key_cmp¶
-
std::string output_suffix_for_left¶
-
std::string output_suffix_for_right¶
-
Expression filter = literal(true)¶
-
bool disable_bloom_filter = false¶
-
inline HashJoinNodeOptions(JoinType in_join_type, std::vector<FieldRef> in_left_keys, std::vector<FieldRef> in_right_keys, Expression filter = literal(true), std::string output_suffix_for_left = default_output_suffix_for_left, std::string output_suffix_for_right = default_output_suffix_for_right, bool disable_bloom_filter = false)¶
-
class AsofJoinNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which implements asof join operation.
Note, this API is experimental and will change in the future
This node takes one left table and any number of right tables, and asof joins them together. Batches produced by each input must be ordered by the “on” key. This node will output one row for each row in the left table.
Public Members
-
int64_t tolerance¶
Tolerance for inexact “on” key matching.
Must be non-negative.
The tolerance is interpreted in the same units as the “on” key.
-
struct Keys¶
- #include <arrow/compute/exec/options.h>
Keys for one input table of the AsofJoin operation.
The keys must be consistent across the input tables: Each “on” key must refer to a field of the same type and units across the tables. Each “by” key must refer to a list of fields of the same types across the tables.
Public Members
-
FieldRef on_key¶
“on” key for the join.
The input table must be sorted by the “on” key. Must be a single field of a common type. Inexact match is used on the “on” key. i.e., a row is considered a match iff left_on - tolerance <= right_on <= left_on. Currently, the “on” key must be of an integer, date, or timestamp type.
-
FieldRef on_key¶
-
int64_t tolerance¶
-
class SelectKSinkNodeOptions : public arrow::compute::SinkNodeOptions¶
- #include <arrow/compute/exec/options.h>
Make a node which select top_k/bottom_k rows passed through it.
All batches pushed to this node will be accumulated, then selected, by the given fields. Then sorted batches will be forwarded to the generator in sorted order.
Public Functions
-
inline explicit SelectKSinkNodeOptions(SelectKOptions select_k_options, std::function<Future<std::optional<ExecBatch>>()> *generator)¶
Public Members
-
SelectKOptions select_k_options¶
SelectK options.
-
inline explicit SelectKSinkNodeOptions(SelectKOptions select_k_options, std::function<Future<std::optional<ExecBatch>>()> *generator)¶
-
class TableSinkNodeOptions : public arrow::compute::ExecNodeOptions¶
- #include <arrow/compute/exec/options.h>
Adapt a Table as a sink node.
obtains the output of an execution plan to a table pointer.
Public Functions
-
constexpr int kDefaultBackgroundMaxQ = 32¶
-
constexpr int kDefaultBackgroundQRestart = 16¶
-
ExecFactoryRegistry *default_exec_factory_registry()¶
The default registry, which includes built-in factories.
-
inline Result<ExecNode*> MakeExecNode(const std::string &factory_name, ExecPlan *plan, std::vector<ExecNode*> inputs, const ExecNodeOptions &options, ExecFactoryRegistry *registry = default_exec_factory_registry())¶
Construct an ExecNode using the named factory.
-
Result<std::shared_ptr<Table>> DeclarationToTable(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Utility method to run a declaration and collect the results into a table.
This method will add a sink node to the declaration to collect results into a table. It will then create an ExecPlan from the declaration, start the exec plan, block until the plan has finished, and return the created table.
- Parameters:
declaration – A declaration describing the plan to run
use_threads – If
use_threads
is false then all CPU work will be done on the calling thread. I/O tasks will still happen on the I/O executor and may be multi-threaded (but should not use significant CPU resources).memory_pool – The memory pool to use for allocations made while running the plan.
function_registry – The function registry to use for function execution. If null then the default function registry will be used.
-
Future<std::shared_ptr<Table>> DeclarationToTableAsync(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Asynchronous version of.
See also
- Parameters:
declaration – A declaration describing the plan to run
use_threads – The behavior of use_threads is slightly different than the synchronous version since we cannot run synchronously on the calling thread. Instead, if use_threads=false then a new thread pool will be created with a single thread and this will be used for all compute work.
memory_pool – The memory pool to use for allocations made while running the plan.
function_registry – The function registry to use for function execution. If null then the default function registry will be used.
-
Future<std::shared_ptr<Table>> DeclarationToTableAsync(Declaration declaration, ExecContext custom_exec_context)¶
Overload of.
The executor must be specified (cannot be null) and must be kept alive until the returned future finishes.
See also
DeclarationToTableAsync accepting a custom exec context
-
Result<BatchesWithCommonSchema> DeclarationToExecBatches(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Utility method to run a declaration and collect the results into ExecBatch vector.
See also
DeclarationToTable for details on threading & execution
-
Future<BatchesWithCommonSchema> DeclarationToExecBatchesAsync(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Asynchronous version of.
See also
See also
DeclarationToTableAsync for details on threading & execution
-
Future<BatchesWithCommonSchema> DeclarationToExecBatchesAsync(Declaration declaration, ExecContext custom_exec_context)¶
Overload of.
See also
DeclarationToExecBatchesAsync accepting a custom exec context
See also
DeclarationToTableAsync for details on threading & execution
-
Result<std::vector<std::shared_ptr<RecordBatch>>> DeclarationToBatches(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Utility method to run a declaration and collect the results into a vector.
See also
DeclarationToTable for details on threading & execution
-
Future<std::vector<std::shared_ptr<RecordBatch>>> DeclarationToBatchesAsync(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Asynchronous version of.
See also
See also
DeclarationToTableAsync for details on threading & execution
-
Future<std::vector<std::shared_ptr<RecordBatch>>> DeclarationToBatchesAsync(Declaration declaration, ExecContext exec_context)¶
Overload of.
See also
DeclarationToBatchesAsync accepting a custom exec context
See also
DeclarationToTableAsync for details on threading & execution
-
Result<std::unique_ptr<RecordBatchReader>> DeclarationToReader(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Utility method to run a declaration and return results as a RecordBatchReader.
If an exec context is not provided then a default exec context will be used based on the value of
use_threads
. Ifuse_threads
is false then the CPU exeuctor will be a serial executor and all CPU work will be done on the calling thread. I/O tasks will still happen on the I/O executor and may be multi-threaded.If
use_threads
is false then all CPU work will happen during the calls to RecordBatchReader::Next and no CPU work will happen in the background. Ifuse_threads
is true then CPU work will happen on the CPU thread pool and tasks may run in between calls to RecordBatchReader::Next. If the returned reader is not consumed quickly enough then the plan will eventually pause as the backpressure queue fills up.If a custom exec context is provided then the value of
use_threads
will be ignored.
-
Result<std::unique_ptr<RecordBatchReader>> DeclarationToReader(Declaration declaration, ExecContext exec_context)¶
Overload of.
See also
DeclarationToReader accepting a custom exec context
-
Status DeclarationToStatus(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Utility method to run a declaration and ignore results.
This can be useful when the data are consumed as part of the plan itself, for example, when the plan ends with a write node.
See also
DeclarationToTable for details on threading & execution
-
Future DeclarationToStatusAsync(Declaration declaration, bool use_threads = true, MemoryPool *memory_pool = default_memory_pool(), FunctionRegistry *function_registry = NULLPTR)¶
Asynchronous version of.
This can be useful when the data are consumed as part of the plan itself, for example, when the plan ends with a write node.
See also
See also
DeclarationToTableAsync for details on threading & execution
-
Future DeclarationToStatusAsync(Declaration declaration, ExecContext exec_context)¶
Overload of.
See also
DeclarationToStatusAsync accepting a custom exec context
See also
DeclarationToTableAsync for details on threading & execution
Wrap an ExecBatch generator in a RecordBatchReader.
The RecordBatchReader does not impose any ordering on emitted batches.
- Result< std::function< Future< std::optional< ExecBatch > >)> > MakeReaderGenerator (std::shared_ptr< RecordBatchReader > reader, arrow::internal::Executor *io_executor, int max_q=kDefaultBackgroundMaxQ, int q_restart=kDefaultBackgroundQRestart)
Make a generator of RecordBatchReaders.
Useful as a source node for an Exec plan
-
inline bool operator==(const ExecBatch &l, const ExecBatch &r)¶
-
inline bool operator!=(const ExecBatch &l, const ExecBatch &r)¶
-
void PrintTo(const ExecBatch&, std::ostream*)¶
-
class ExecPlan : public std::enable_shared_from_this<ExecPlan>¶
- #include <arrow/compute/exec/exec_plan.h>
Public Functions
-
virtual ~ExecPlan() = default¶
-
QueryContext *query_context()¶
-
const NodeVector &sources() const¶
The initial inputs.
-
const NodeVector &sinks() const¶
The final outputs.
-
Status StartProducing()¶
Start producing on all nodes.
Nodes are started in reverse topological order, such that any node is started before all of its inputs.
-
void StopProducing()¶
Stop producing on all nodes.
Nodes are stopped in topological order, such that any node is stopped before all of its outputs.
-
bool HasMetadata() const¶
Return whether the plan has non-empty metadata.
-
std::shared_ptr<const KeyValueMetadata> metadata() const¶
Return the plan’s attached metadata.
-
std::string ToString() const¶
Public Static Functions
Make an empty exec plan.
Public Static Attributes
-
static const uint32_t kMaxBatchSize = 1 << 15¶
-
virtual ~ExecPlan() = default¶
-
class ExecNode¶
- #include <arrow/compute/exec/exec_plan.h>
Subclassed by arrow::compute::MapNode
Public Functions
-
virtual ~ExecNode() = default¶
-
virtual const char *kind_name() const = 0¶
-
inline int num_inputs() const¶
-
inline int num_outputs() const¶
-
inline const NodeVector &inputs() const¶
This node’s predecessors in the exec plan.
-
inline const std::vector<std::string> &input_labels() const¶
Labels identifying the function of each input.
-
inline const NodeVector &outputs() const¶
This node’s successors in the exec plan.
-
inline const std::shared_ptr<Schema> &output_schema() const¶
The datatypes for batches produced by this node.
-
inline const std::string &label() const¶
An optional label, for display and debugging.
There is no guarantee that this value is non-empty or unique.
-
inline void SetLabel(std::string label)¶
-
virtual void InputReceived(ExecNode *input, ExecBatch batch) = 0¶
Upstream API: These functions are called by input nodes that want to inform this node about an updated condition (a new input batch, an error, an impeding end of stream).
Implementation rules:
these may be called anytime after StartProducing() has succeeded (and even during or after StopProducing())
these may be called concurrently
these are allowed to call back into PauseProducing(), ResumeProducing() and StopProducing() Transfer input batch to ExecNode
-
virtual void InputFinished(ExecNode *input, int total_batches) = 0¶
Mark the inputs finished after the given number of batches.
This may be called before all inputs are received. This simply fixes the total number of incoming batches for an input, so that the ExecNode knows when it has received all input, regardless of order.
-
virtual Status Init()¶
Perform any needed initialization.
This hook performs any actions in between creation of ExecPlan and the call to StartProducing. An example could be Bloom filter pushdown. The order of ExecNodes that executes this method is undefined, but the calls are made synchronously.
At this point a node can rely on all inputs & outputs (and the input schemas) being well defined.
-
virtual Status StartProducing() = 0¶
Lifecycle API:
start / stop to initiate and terminate production
pause / resume to apply backpressure
Implementation rules:
StartProducing() should not recurse into the inputs, as it is handled by ExecPlan::StartProducing()
PauseProducing(), ResumeProducing(), StopProducing() may be called concurrently (but only after StartProducing() has returned successfully)
PauseProducing(), ResumeProducing(), StopProducing() may be called by the downstream nodes’ InputReceived(), ErrorReceived(), InputFinished() methods
StopProducing() should recurse into the inputs
StopProducing() must be idempotent
Start producing
This must only be called once. If this fails, then other lifecycle methods must not be called.
This is typically called automatically by ExecPlan::StartProducing().
-
virtual void PauseProducing(ExecNode *output, int32_t counter) = 0¶
Pause producing temporarily.
This call is a hint that an output node is currently not willing to receive data.
This may be called any number of times after StartProducing() succeeds. However, the node is still free to produce data (which may be difficult to prevent anyway if data is produced using multiple threads).
- Parameters:
output – Pointer to the output that is full
counter – Counter used to sequence calls to pause/resume
-
virtual void ResumeProducing(ExecNode *output, int32_t counter) = 0¶
Resume producing after a temporary pause.
This call is a hint that an output node is willing to receive data again.
This may be called any number of times after StartProducing() succeeds.
- Parameters:
output – Pointer to the output that is now free
counter – Counter used to sequence calls to pause/resume
-
virtual void StopProducing(ExecNode *output) = 0¶
Stop producing definitively to a single output.
This call is a hint that an output node has completed and is not willing to receive any further data.
-
virtual void StopProducing() = 0¶
Stop producing definitively to all outputs.
-
inline virtual Future finished()¶
A future which will be marked finished when this node has stopped producing.
-
std::string ToString(int indent = 0) const¶
-
virtual ~ExecNode() = default¶
-
class ExecFactoryRegistry¶
- #include <arrow/compute/exec/exec_plan.h>
An extensible registry for factories of ExecNodes.
Public Types
-
struct Declaration¶
- #include <arrow/compute/exec/exec_plan.h>
Helper class for declaring sets of ExecNodes efficiently.
A Declaration represents an unconstructed ExecNode (and potentially more since its inputs may also be Declarations). The node can be constructed and added to a plan with Declaration::AddToPlan, which will recursively construct any inputs as necessary.
Public Types
-
using Input = std::variant<ExecNode*, Declaration>¶
Public Functions
-
inline Declaration()¶
-
template<typename Options>
inline Declaration(std::string factory_name, std::vector<Input> inputs, Options options, std::string label)¶
-
template<typename Options>
inline Declaration(std::string factory_name, std::vector<Input> inputs, Options options)¶
-
template<typename Options>
inline Declaration(std::string factory_name, Options options, std::string label)¶
-
Result<ExecNode*> AddToPlan(ExecPlan *plan, ExecFactoryRegistry *registry = default_exec_factory_registry()) const¶
-
bool IsValid(ExecFactoryRegistry *registry = default_exec_factory_registry()) const¶
Public Members
-
std::string factory_name¶
-
std::shared_ptr<ExecNodeOptions> options¶
-
std::string label¶
Public Static Functions
-
static Declaration Sequence(std::vector<Declaration> decls)¶
Convenience factory for the common case of a simple sequence of nodes.
Each of decls will be appended to the inputs of the subsequent declaration, and the final modified declaration will be returned.
Without this convenience factory, constructing a sequence would require explicit, difficult-to-read nesting:
Declaration{"n3", { Declaration{"n2", { Declaration{"n1", { Declaration{"n0", N0Opts{}}, }, N1Opts{}}, }, N2Opts{}}, }, N3Opts{}};
An equivalent Declaration can be constructed more tersely using Sequence:
Declaration::Sequence({ {"n0", N0Opts{}}, {"n1", N1Opts{}}, {"n2", N2Opts{}}, {"n3", N3Opts{}}, });
-
using Input = std::variant<ExecNode*, Declaration>¶
-
struct BatchesWithCommonSchema¶
- #include <arrow/compute/exec/exec_plan.h>
a collection of exec batches with a common schema
-
struct ExecBatch¶
- #include <arrow/compute/exec.h>
Public Functions
-
ExecBatch() = default¶
-
explicit ExecBatch(const RecordBatch &batch)¶
-
int64_t TotalBufferSize() const¶
The sum of bytes in each buffer referenced by the batch.
Note: Scalars are not counted Note: Some values may referenced only part of a buffer, for example, an array with an offset. The actual data visible to this batch will be smaller than the total buffer size in this case.
-
template<typename index_type>
inline const Datum &operator[](index_type i) const¶ Return the value at the i-th index.
-
inline int num_values() const¶
A convenience for the number of values / arguments.
-
inline std::vector<TypeHolder> GetTypes() const¶
A convenience for returning the types from the batch.
-
std::string ToString() const¶
Public Members
-
std::vector<Datum> values¶
The values representing positional arguments to be passed to a kernel’s exec function for processing.
-
std::shared_ptr<SelectionVector> selection_vector¶
A deferred filter represented as an array of indices into the values.
For example, the filter [true, true, false, true] would be represented as the selection vector [0, 1, 3]. When the selection vector is set, ExecBatch::length is equal to the length of this array.
-
Expression guarantee = literal(true)¶
A predicate Expression guaranteed to evaluate to true for all rows in this batch.
-
int64_t length = 0¶
The semantic length of the ExecBatch.
When the values are all scalars, the length should be set to 1 for non-aggregate kernels, otherwise the length is taken from the array values, except when there is a selection vector. When there is a selection vector set, the length of the batch is the length of the selection. Aggregate kernels can have an ExecBatch formed by projecting just the partition columns from a batch in which case, it would have scalar rows with length greater than 1.
If the array values are of length 0 then the length is 0 regardless of whether any values are Scalar.
-
ExecBatch() = default¶
-
struct ExecValue¶
- #include <arrow/compute/exec.h>
-
struct ExecResult¶
- #include <arrow/compute/exec.h>
Execution Plan Expressions¶
-
inline bool operator==(const Expression &l, const Expression &r)¶
-
inline bool operator!=(const Expression &l, const Expression &r)¶
-
void PrintTo(const Expression&, std::ostream*)¶
-
Expression literal(Datum lit)¶
-
Expression field_ref(FieldRef ref)¶
-
template<typename Options, typename = typename std::enable_if<std::is_base_of<FunctionOptions, Options>::value>::type>
Expression call(std::string function, std::vector<Expression> arguments, Options options)¶
-
std::vector<FieldRef> FieldsInExpression(const Expression&)¶
Assemble a list of all fields referenced by an Expression at any depth.
-
bool ExpressionHasFieldRefs(const Expression&)¶
Check if the expression references any fields.
-
Result<KnownFieldValues> ExtractKnownFieldValues(const Expression &guaranteed_true_predicate)¶
Assemble a mapping from field references to known values.
This derives known values from “equal” and “is_null” Expressions referencing a field and a literal.
-
class Expression¶
- #include <arrow/compute/exec/expression.h>
An unbound expression which maps a single Datum to another Datum.
An expression is one of
A literal Datum.
A reference to a single (potentially nested) field of the input Datum.
A call to a compute function, with arguments specified by other Expressions.
Public Functions
-
Result<Expression> Bind(const TypeHolder &in, ExecContext* = NULLPTR) const¶
Bind this expression to the given input type, looking up Kernels and field types.
Some expression simplification may be performed and implicit casts will be inserted. Any state necessary for execution will be initialized and returned.
-
bool IsBound() const¶
Return true if all an expression’s field references have explicit types and all of its functions’ kernels are looked up.
-
bool IsScalarExpression() const¶
Return true if this expression is composed only of Scalar literals, field references, and calls to ScalarFunctions.
-
bool IsNullLiteral() const¶
Return true if this expression is literal and entirely null.
-
bool IsSatisfiable() const¶
Return true if this expression could evaluate to true.
Will return true for any unbound, non-boolean, or unsimplified Expressions
-
struct Call¶
- #include <arrow/compute/exec/expression.h>
-
struct Hash¶
- #include <arrow/compute/exec/expression.h>
-
struct Parameter¶
- #include <arrow/compute/exec/expression.h>
-
Expression project(std::vector<Expression> values, std::vector<std::string> names)¶
-
Expression equal(Expression lhs, Expression rhs)¶
-
Expression not_equal(Expression lhs, Expression rhs)¶
-
Expression less(Expression lhs, Expression rhs)¶
-
Expression less_equal(Expression lhs, Expression rhs)¶
-
Expression greater(Expression lhs, Expression rhs)¶
-
Expression greater_equal(Expression lhs, Expression rhs)¶
-
Expression is_null(Expression lhs, bool nan_is_null = false)¶
-
Expression is_valid(Expression lhs)¶
-
Expression and_(Expression lhs, Expression rhs)¶
-
Expression and_(const std::vector<Expression>&)¶
-
Expression or_(Expression lhs, Expression rhs)¶
-
Expression or_(const std::vector<Expression>&)¶
-
Expression not_(Expression operand)¶
-
Result<Expression> Canonicalize(Expression, ExecContext* = NULLPTR)¶
Weak canonicalization which establishes guarantees for subsequent passes.
Even equivalent Expressions may result in different canonicalized expressions. TODO this could be a strong canonicalization
-
Result<Expression> FoldConstants(Expression)¶
Simplify Expressions based on literal arguments (for example, add(null, x) will always be null so replace the call with a null literal).
Includes early evaluation of all calls whose arguments are entirely literal.
-
Result<Expression> ReplaceFieldsWithKnownValues(const KnownFieldValues &known_values, Expression)¶
Simplify Expressions by replacing with known values of the fields which it references.
-
Result<Expression> SimplifyWithGuarantee(Expression, const Expression &guaranteed_true_predicate)¶
Simplify an expression by replacing subexpressions based on a guarantee: a boolean expression which is guaranteed to evaluate to
true
.For example, this is used to remove redundant function calls from a filter expression or to replace a reference to a constant-value field with a literal.
-
Result<Expression> RemoveNamedRefs(Expression expression)¶
Replace all named field refs (e.g.
“x” or “x.y”) with field paths (e.g. [0] or [1,3])
This isn’t usually needed and does not offer any simplification by itself. However, it can be useful to normalize an expression to paths to make it simpler to work with.