Gandiva Expression Compiler

TreeExprBuilder Class

class TreeExprBuilder

Tree Builder for a nested expression.

Public Static Functions

static NodePtr MakeLiteral(bool value)

create a node on a literal.

static NodePtr MakeNull(DataTypePtr data_type)

create a node on a null literal.

returns null if data_type is null or if it’s not a supported datatype.

static NodePtr MakeField(FieldPtr field)

create a node on arrow field.

returns null if input is null.

static NodePtr MakeFunction(const std::string &name, const NodeVector &params, DataTypePtr return_type)

create a node with a function.

returns null if return_type is null

static NodePtr MakeIf(NodePtr condition, NodePtr then_node, NodePtr else_node, DataTypePtr result_type)

create a node with an if-else expression.

returns null if any of the inputs is null.

static NodePtr MakeAnd(const NodeVector &children)

create a node with a boolean AND expression.

static NodePtr MakeOr(const NodeVector &children)

create a node with a boolean OR expression.

static ExpressionPtr MakeExpression(NodePtr root_node, FieldPtr result_field)

create an expression with the specified root_node, and the result written to result_field.

returns null if the result_field is null.

static ExpressionPtr MakeExpression(const std::string &function, const FieldVector &in_fields, FieldPtr out_field)

convenience function for simple function expressions.

returns null if the out_field is null.

static ConditionPtr MakeCondition(NodePtr root_node)

create a condition with the specified root_node

static ConditionPtr MakeCondition(const std::string &function, const FieldVector &in_fields)

convenience function for simple function conditions.

static NodePtr MakeInExpressionInt32(NodePtr node, const std::unordered_set<int32_t> &constants)

creates an in expression

static NodePtr MakeInExpressionFloat(NodePtr node, const std::unordered_set<float> &constants)

creates an in expression for float

static NodePtr MakeInExpressionDouble(NodePtr node, const std::unordered_set<double> &constants)

creates an in expression for double

static NodePtr MakeInExpressionDate32(NodePtr node, const std::unordered_set<int32_t> &constants)

Date as s/millis since epoch.

static NodePtr MakeInExpressionDate64(NodePtr node, const std::unordered_set<int64_t> &constants)

Date as millis/us/ns since epoch.

static NodePtr MakeInExpressionTime32(NodePtr node, const std::unordered_set<int32_t> &constants)

Time as s/millis of day.

static NodePtr MakeInExpressionTime64(NodePtr node, const std::unordered_set<int64_t> &constants)

Time as millis/us/ns of day.

static NodePtr MakeInExpressionTimeStamp(NodePtr node, const std::unordered_set<int64_t> &constants)

Timestamp as millis since epoch.

class Node

Represents a node in the expression tree.

Validity and value are in a joined state.

Subclassed by gandiva::BooleanNode, gandiva::FieldNode, gandiva::FunctionNode, gandiva::IfNode, gandiva::InExpressionNode< Type >, gandiva::InExpressionNode< gandiva::DecimalScalar128 >, gandiva::LiteralNode

Public Functions

virtual Status Accept(NodeVisitor &visitor) const = 0

Derived classes should simply invoke the Visit api of the visitor.

class Expression

An expression tree with a root node, and a result field.

Subclassed by gandiva::Condition

class Condition : public gandiva::Expression

A condition expression.

Function registry

class FunctionSignature

Signature for a function : includes the base name, input param types and output types.

Public Functions

std::size_t Hash() const

calculated based on name, datatype id of parameters and datatype id of return type.

std::vector<std::shared_ptr<FunctionSignature>> gandiva::GetRegisteredFunctionSignatures()

Get the list of all function signatures.

Configuration

class Configuration

runtime config for gandiva

It contains elements to customize gandiva execution at run time.

class ConfigurationBuilder

configuration builder for gandiva

Provides a default configuration and convenience methods to override specific values and build a custom instance

Projector

class Projector

projection using expressions.

A projector is built for a specific schema and vector of expressions. Once the projector is built, it can be used to evaluate many row batches.

Public Functions

Status Evaluate(const arrow::RecordBatch &batch, arrow::MemoryPool *pool, arrow::ArrayVector *output) const

Evaluate the specified record batch, and return the allocated and populated output arrays.

The output arrays will be allocated from the memory pool ‘pool’, and added to the vector ‘output’.

Parameters:
  • batch[in] the record batch. schema should be the same as the one in ‘Make’

  • pool[in] memory pool used to allocate output arrays (if required).

  • output[out] the vector of allocated/populated arrays.

Status Evaluate(const arrow::RecordBatch &batch, const ArrayDataVector &output) const

Evaluate the specified record batch, and populate the output arrays.

The output arrays of sufficient capacity must be allocated by the caller.

Parameters:
  • batch[in] the record batch. schema should be the same as the one in ‘Make’

  • output[inout] vector of arrays, the arrays are allocated by the caller and populated by Evaluate.

Status Evaluate(const arrow::RecordBatch &batch, const SelectionVector *selection_vector, arrow::MemoryPool *pool, arrow::ArrayVector *output) const

Evaluate the specified record batch, and return the allocated and populated output arrays.

The output arrays will be allocated from the memory pool ‘pool’, and added to the vector ‘output’.

Parameters:
  • batch[in] the record batch. schema should be the same as the one in ‘Make’

  • selection_vector[in] selection vector which has filtered row positions.

  • pool[in] memory pool used to allocate output arrays (if required).

  • output[out] the vector of allocated/populated arrays.

Status Evaluate(const arrow::RecordBatch &batch, const SelectionVector *selection_vector, const ArrayDataVector &output) const

Evaluate the specified record batch, and populate the output arrays at the filtered positions.

The output arrays of sufficient capacity must be allocated by the caller.

Parameters:
  • batch[in] the record batch. schema should be the same as the one in ‘Make’

  • selection_vector[in] selection vector which has the filtered row positions

  • output[inout] vector of arrays, the arrays are allocated by the caller and populated by Evaluate.

Public Static Functions

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, std::shared_ptr<Projector> *projector)

Build a default projector for the given schema to evaluate the vector of expressions.

Parameters:
  • schema[in] schema for the record batches, and the expressions.

  • exprs[in] vector of expressions.

  • projector[out] the returned projector object

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, std::shared_ptr<Configuration> configuration, std::shared_ptr<Projector> *projector)

Build a projector for the given schema to evaluate the vector of expressions.

Customize the projector with runtime configuration.

Parameters:
  • schema[in] schema for the record batches, and the expressions.

  • exprs[in] vector of expressions.

  • configuration[in] run time configuration.

  • projector[out] the returned projector object

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, SelectionVector::Mode selection_vector_mode, std::shared_ptr<Configuration> configuration, std::shared_ptr<Projector> *projector)

Build a projector for the given schema to evaluate the vector of expressions.

Customize the projector with runtime configuration.

Parameters:
  • schema[in] schema for the record batches, and the expressions.

  • exprs[in] vector of expressions.

  • selection_vector_mode[in] mode of selection vector

  • configuration[in] run time configuration.

  • projector[out] the returned projector object

Filter

class Filter

filter records based on a condition.

A filter is built for a specific schema and condition. Once the filter is built, it can be used to evaluate many row batches.

Public Functions

Status Evaluate(const arrow::RecordBatch &batch, std::shared_ptr<SelectionVector> out_selection)

Evaluate the specified record batch, and populate output selection vector.

Parameters:
  • batch[in] the record batch. schema should be the same as the one in ‘Make’

  • out_selection[inout] the selection array with indices of rows that match the condition.

Public Static Functions

static inline Status Make(SchemaPtr schema, ConditionPtr condition, std::shared_ptr<Filter> *filter)

Build a filter for the given schema and condition, with the default configuration.

Parameters:
  • schema[in] schema for the record batches, and the condition.

  • condition[in] filter condition.

  • filter[out] the returned filter object

static Status Make(SchemaPtr schema, ConditionPtr condition, std::shared_ptr<Configuration> config, std::shared_ptr<Filter> *filter)

Build a filter for the given schema and condition.

Customize the filter with runtime configuration.

Parameters:
  • schema[in] schema for the record batches, and the condition.

  • condition[in] filter conditions.

  • config[in] run time configuration.

  • filter[out] the returned filter object

class SelectionVector

Selection Vector : vector of indices in a row-batch for a selection, backed by an arrow-array.

Subclassed by gandiva::SelectionVectorImpl< C_TYPE, A_TYPE, mode >

Public Functions

virtual uint64_t GetIndex(int64_t index) const = 0

Get the value at a given index.

virtual void SetIndex(int64_t index, uint64_t value) = 0

Set the value at a given index.

virtual int64_t GetMaxSlots() const = 0

The maximum slots (capacity) of the selection vector.

virtual int64_t GetNumSlots() const = 0

The number of slots (size) of the selection vector.

virtual void SetNumSlots(int64_t num_slots) = 0

Set the number of slots in the selection vector.

virtual ArrayPtr ToArray() const = 0

Convert to arrow-array.

virtual arrow::Buffer &GetBuffer() const = 0

Get the underlying arrow buffer.

virtual Mode GetMode() const = 0

Mode of SelectionVector.

Status PopulateFromBitMap(const uint8_t *bitmap, int64_t bitmap_size, int64_t max_bitmap_index)

populate selection vector for all the set bits in the bitmap.

Parameters:
  • bitmap[in] the bitmap

  • bitmap_size[in] size of the bitmap in bytes

  • max_bitmap_index[in] max valid index in bitmap (can be lesser than capacity in the bitmap, due to alignment/padding).

Public Static Functions

static Status MakeInt16(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)

make selection vector with int16 type records.

Parameters:
  • max_slots[in] max number of slots

  • buffer[in] buffer sized to accommodate max_slots

  • selection_vector[out] selection vector backed by ‘buffer’

static Status MakeInt16(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)
Parameters:
  • max_slots[in] max number of slots

  • pool[in] memory pool to allocate buffer

  • selection_vector[out] selection vector backed by a buffer allocated from the pool.

static Status MakeImmutableInt16(int64_t num_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)

creates a selection vector with pre populated buffer.

Parameters:
  • num_slots[in] size of the selection vector

  • buffer[in] pre-populated buffer

  • selection_vector[out] selection vector backed by ‘buffer’

static Status MakeInt32(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)

make selection vector with int32 type records.

Parameters:
  • max_slots[in] max number of slots

  • buffer[in] buffer sized to accommodate max_slots

  • selection_vector[out] selection vector backed by ‘buffer’

static Status MakeInt32(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)

make selection vector with int32 type records.

Parameters:
  • max_slots[in] max number of slots

  • pool[in] memory pool to allocate buffer

  • selection_vector[out] selection vector backed by a buffer allocated from the pool.

static Status MakeImmutableInt32(int64_t num_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)

creates a selection vector with pre populated buffer.

Parameters:
  • num_slots[in] size of the selection vector

  • buffer[in] pre-populated buffer

  • selection_vector[out] selection vector backed by ‘buffer’

static Status MakeInt64(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)

make selection vector with int64 type records.

Parameters:
  • max_slots[in] max number of slots

  • buffer[in] buffer sized to accommodate max_slots

  • selection_vector[out] selection vector backed by ‘buffer’

static Status MakeInt64(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)

make selection vector with int64 type records.

Parameters:
  • max_slots[in] max number of slots

  • pool[in] memory pool to allocate buffer

  • selection_vector[out] selection vector backed by a buffer allocated from the pool.