Streaming execution engine#

Warning

The streaming execution engine is experimental, and a stable API is not yet guaranteed.

Motivation#

For many complex computations, successive direct invocation of compute functions is not feasible in either memory or computation time. Doing so causes all intermediate data to be fully materialized. To facilitate arbitrarily large inputs and more efficient resource usage, Arrow also provides a streaming query engine with which computations can be formulated and executed.

An example graph of a streaming execution workflow.

ExecNode is provided to reify the graph of operations in a query. Batches of data (ExecBatch) flow along edges of the graph from node to node. Structuring the API around streams of batches allows the working set for each node to be tuned for optimal performance independent of any other nodes in the graph. Each ExecNode processes batches as they are pushed to it along an edge of the graph by upstream nodes (its inputs), and pushes batches along an edge of the graph to downstream nodes (its outputs) as they are finalized.

Overview#

ExecNode: Each node in the graph is an implementation of the ExecNode interface.
ExecPlan: A set of ExecNode is contained and (to an extent) coordinated by an ExecPlan.
ExecFactoryRegistry: Instances of ExecNode are constructed by factory functions held in a ExecFactoryRegistry.
ExecNodeOptions: Heterogenous parameters for factories of ExecNode are bundled in an ExecNodeOptions.
Declaration: dplyr-inspired helper for efficient construction of an ExecPlan.
ExecBatch: A lightweight container for a single chunk of data in the Arrow format. In contrast to RecordBatch, ExecBatch is intended for use exclusively in a streaming execution context (for example, it doesn’t have a corresponding Python binding). Furthermore columns which happen to have a constant value may be represented by a Scalar instead of an Array. In addition, ExecBatch may carry execution-relevant properties including a guaranteed-true-filter for Expression simplification.

An example ExecNode implementation which simply passes all input batches through unchanged:

class PassthruNode : public ExecNode {
 public:
  // InputReceived is the main entry point for ExecNodes. It is invoked
  // by an input of this node to push a batch here for processing.
  void InputReceived(ExecNode* input, ExecBatch batch) override {
    // Since this is a passthru node we simply push the batch to our
    // only output here.
    outputs_[0]->InputReceived(this, batch);
  }

  // ErrorReceived is called by an input of this node to report an error.
  // ExecNodes should always forward errors to their outputs unless they
  // are able to fully handle the error (this is rare).
  void ErrorReceived(ExecNode* input, Status error) override {
    outputs_[0]->ErrorReceived(this, error);
  }

  // InputFinished is used to signal how many batches will ultimately arrive.
  // It may be called with any ordering relative to InputReceived/ErrorReceived.
  void InputFinished(ExecNode* input, int total_batches) override {
    outputs_[0]->InputFinished(this, total_batches);
  }

  // ExecNodes may request that their inputs throttle production of batches
  // until they are ready for more, or stop production if no further batches
  // are required.  These signals should typically be forwarded to the inputs
  // of the ExecNode.
  void ResumeProducing(ExecNode* output) override { inputs_[0]->ResumeProducing(this); }
  void PauseProducing(ExecNode* output) override { inputs_[0]->PauseProducing(this); }
  void StopProducing(ExecNode* output) override { inputs_[0]->StopProducing(this); }

  // An ExecNode has a single output schema to which all its batches conform.
  using ExecNode::output_schema;

  // ExecNodes carry basic introspection for debugging purposes
  const char* kind_name() const override { return "PassthruNode"; }
  using ExecNode::label;
  using ExecNode::SetLabel;
  using ExecNode::ToString;

  // An ExecNode holds references to its inputs and outputs, so it is possible
  // to walk the graph of execution if necessary.
  using ExecNode::inputs;
  using ExecNode::outputs;

  // StartProducing() and StopProducing() are invoked by an ExecPlan to
  // coordinate the graph-wide execution state.  These do not need to be
  // forwarded to inputs or outputs.
  Status StartProducing() override { return Status::OK(); }
  void StopProducing() override {}
  Future<> finished() override { return inputs_[0]->finished(); }
};

Note that each method which is associated with an edge of the graph must be invoked with an ExecNode* to identify the node which invoked it. For example, in an ExecNode which implements JOIN this tagging might be used to differentiate between batches from the left or right inputs. InputReceived, ErrorReceived, InputFinished may only be invoked by the inputs of a node, while ResumeProducing, PauseProducing, StopProducing may only be invoked by outputs of a node.

ExecPlan contains the associated instances of ExecNode and is used to start and stop execution of all nodes and for querying/awaiting their completion:

// construct an ExecPlan first to hold your nodes
ARROW_ASSIGN_OR_RAISE(auto plan, ExecPlan::Make(default_exec_context()));

// ... add nodes to your ExecPlan

// start all nodes in the graph
ARROW_RETURN_NOT_OK(plan->StartProducing());

SetUserCancellationCallback([plan] {
  // stop all nodes in the graph
  plan->StopProducing();
});

// Complete will be marked finished when all nodes have run to completion
// or acknowledged a StopProducing() signal. The ExecPlan should be kept
// alive until this future is marked finished.
Future<> complete = plan->finished();

Constructing `ExecPlan` objects#

Warning

The following will be superceded by construction from Compute IR, see ARROW-14074.

None of the concrete implementations of ExecNode are exposed in headers, so they can’t be constructed directly outside the translation unit where they are defined. Instead, factories to create them are provided in an extensible registry. This structure provides a number of benefits:

This enforces consistent construction.
It decouples implementations from consumers of the interface (for example: we have two classes for scalar and grouped aggregate, we can choose which to construct within the single factory by checking whether grouping keys are provided)
This expedites integration with out-of-library extensions. For example “scan” nodes are implemented in the separate libarrow_dataset.so library.
Since the class is not referencable outside the translation unit in which it is defined, compilers can optimize more aggressively.

Factories of ExecNode can be retrieved by name from the registry. The default registry is available through arrow::compute::default_exec_factory_registry() and can be queried for the built-in factories:

// get the factory for "filter" nodes:
ARROW_ASSIGN_OR_RAISE(auto make_filter,
                      default_exec_factory_registry()->GetFactory("filter"));

// factories take three arguments:
ARROW_ASSIGN_OR_RAISE(ExecNode* filter_node, *make_filter(
    // the ExecPlan which should own this node
    plan.get(),

    // nodes which will send batches to this node (inputs)
    {scan_node},

    // parameters unique to "filter" nodes
    FilterNodeOptions{filter_expression}));

// alternative shorthand:
ARROW_ASSIGN_OR_RAISE(filter_node, MakeExecNode("filter",
    plan.get(), {scan_node}, FilterNodeOptions{filter_expression});

Factories can also be added to the default registry as long as they are convertible to std::function<Result<ExecNode*>( ExecPlan*, std::vector<ExecNode*>, const ExecNodeOptions&)>.

To build an ExecPlan representing a simple pipeline which reads from a RecordBatchReader then filters, projects, and writes to disk:

std::shared_ptr<RecordBatchReader> reader = GetStreamOfBatches();
ExecNode* source_node = *MakeExecNode("source", plan.get(), {},
                                      SourceNodeOptions::FromReader(
                                          reader,
                                          GetCpuThreadPool()));

ExecNode* filter_node = *MakeExecNode("filter", plan.get(), {source_node},
                                      FilterNodeOptions{
                                        greater(field_ref("score"), literal(3))
                                      });

ExecNode* project_node = *MakeExecNode("project", plan.get(), {filter_node},
                                       ProjectNodeOptions{
                                         {add(field_ref("score"), literal(1))},
                                         {"score + 1"}
                                       });

arrow::dataset::internal::Initialize();
MakeExecNode("write", plan.get(), {project_node},
             WriteNodeOptions{/*base_dir=*/"/dat", /*...*/});

Declaration is a dplyr-inspired helper which further decreases the boilerplate associated with populating an ExecPlan from C++:

arrow::dataset::internal::Initialize();

std::shared_ptr<RecordBatchReader> reader = GetStreamOfBatches();
ASSERT_OK(Declaration::Sequence(
              {
                  {"source", SourceNodeOptions::FromReader(
                       reader,
                       GetCpuThreadPool())},
                  {"filter", FilterNodeOptions{
                       greater(field_ref("score"), literal(3))}},
                  {"project", ProjectNodeOptions{
                       {add(field_ref("score"), literal(1))},
                       {"score + 1"}}},
                  {"write", WriteNodeOptions{/*base_dir=*/"/dat", /*...*/}},
              })
              .AddToPlan(plan.get()));

Note that a source node can wrap anything which resembles a stream of batches. For example, PR#11032 adds support for use of a DuckDB query as a source node. Similarly, a sink node can wrap anything which absorbs a stream of batches. In the example above we’re writing completed batches to disk. However we can also collect these in memory into a Table or forward them to a RecordBatchReader as an out-of-graph stream. This flexibility allows an ExecPlan to be used as streaming middleware between any endpoints which support Arrow formatted batches.

An arrow::dataset::Dataset can also be wrapped as a source node which pushes all the dataset’s batches into an ExecPlan. This factory is added to the default registry with the name "scan" by calling arrow::dataset::internal::Initialize():

arrow::dataset::internal::Initialize();

std::shared_ptr<Dataset> dataset = GetDataset();

ASSERT_OK(Declaration::Sequence(
              {
                  {"scan", ScanNodeOptions{dataset,
                     /* push down predicate, projection, ... */}},
                  {"filter", FilterNodeOptions{/* ... */}},
                  // ...
              })
              .AddToPlan(plan.get()));

Datasets may be scanned multiple times; just make multiple scan nodes from that dataset. (Useful for a self-join, for example.) Note that producing two scan nodes like this will perform all reads and decodes twice.

Constructing `ExecNode` using Options#

ExecNode is the component we use as a building block containing in-built operations with various functionalities.

This is the list of operations associated with the execution plan:

Operations and Options#
Operation	Options
`source`	`arrow::compute::SourceNodeOptions`
`table_source`	`arrow::compute::TableSourceNodeOptions`
`filter`	`arrow::compute::FilterNodeOptions`
`project`	`arrow::compute::ProjectNodeOptions`
`aggregate`	`arrow::compute::AggregateNodeOptions`
`sink`	`arrow::compute::SinkNodeOptions`
`consuming_sink`	`arrow::compute::ConsumingSinkNodeOptions`
`order_by_sink`	`arrow::compute::OrderBySinkNodeOptions`
`select_k_sink`	`arrow::compute::SelectKSinkNodeOptions`
`scan`	`arrow::dataset::ScanNodeOptions`
`hash_join`	`arrow::compute::HashJoinNodeOptions`
`write`	`arrow::dataset::WriteNodeOptions`
`union`	N/A
`table_sink`	`arrow::compute::TableSinkNodeOptions`

`source`#

A source operation can be considered as an entry point to create a streaming execution plan. arrow::compute::SourceNodeOptions are used to create the source operation. The source operation is the most generic and flexible type of source currently available but it can be quite tricky to configure. To process data from files the scan operation is likely a simpler choice.

The source node requires some kind of function that can be called to poll for more data. This function should take no arguments and should return an arrow::Future<std::shared_ptr<arrow::util::optional<arrow::RecordBatch>>>. This function might be reading a file, iterating through an in memory structure, or receiving data from a network connection. The arrow library refers to these functions as arrow::AsyncGenerator and there are a number of utilities for working with these functions. For this example we use a vector of record batches that we’ve already stored in memory. In addition, the schema of the data must be known up front. Arrow’s streaming execution engine must know the schema of the data at each stage of the execution graph before any processing has begun. This means we must supply the schema for a source node separately from the data itself.

Here we define a struct to hold the data generator definition. This includes in-memory batches, schema and a function that serves as a data generator :

struct BatchesWithSchema {
  std::vector<cp::ExecBatch> batches;
  std::shared_ptr<arrow::Schema> schema;
  // This method uses internal arrow utilities to
  // convert a vector of record batches to an AsyncGenerator of optional batches
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> gen() const {
    auto opt_batches = ::arrow::internal::MapVector(
        [](cp::ExecBatch batch) { return arrow::util::make_optional(std::move(batch)); },
        batches);
    arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> gen;
    gen = arrow::MakeVectorGenerator(std::move(opt_batches));
    return gen;
  }
};

Generating sample batches for computation:

arrow::Result<BatchesWithSchema> MakeBasicBatches() {
  BatchesWithSchema out;
  auto field_vector = {arrow::field("a", arrow::int32()),
                       arrow::field("b", arrow::boolean())};
  ARROW_ASSIGN_OR_RAISE(auto b1_int, GetArrayDataSample<arrow::Int32Type>({0, 4}));
  ARROW_ASSIGN_OR_RAISE(auto b2_int, GetArrayDataSample<arrow::Int32Type>({5, 6, 7}));
  ARROW_ASSIGN_OR_RAISE(auto b3_int, GetArrayDataSample<arrow::Int32Type>({8, 9, 10}));

  ARROW_ASSIGN_OR_RAISE(auto b1_bool,
                        GetArrayDataSample<arrow::BooleanType>({false, true}));
  ARROW_ASSIGN_OR_RAISE(auto b2_bool,
                        GetArrayDataSample<arrow::BooleanType>({true, false, true}));
  ARROW_ASSIGN_OR_RAISE(auto b3_bool,
                        GetArrayDataSample<arrow::BooleanType>({false, true, false}));

  ARROW_ASSIGN_OR_RAISE(auto b1,
                        GetExecBatchFromVectors(field_vector, {b1_int, b1_bool}));
  ARROW_ASSIGN_OR_RAISE(auto b2,
                        GetExecBatchFromVectors(field_vector, {b2_int, b2_bool}));
  ARROW_ASSIGN_OR_RAISE(auto b3,
                        GetExecBatchFromVectors(field_vector, {b3_int, b3_bool}));

  out.batches = {b1, b2, b3};
  out.schema = arrow::schema(field_vector);
  return out;
}

Example of using source (usage of sink is explained in detail in sink):

/// \brief An example demonstrating a source and sink node
/// \param exec_context The execution context to run the plan in
///
/// Source-Sink Example
/// This example shows how a source and sink can be used
/// in an execution plan. This includes source node receiving data
/// and the sink node emits the data as an output represented in
/// a table.
arrow::Status SourceSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {source}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

`table_source`#

In the previous example, source node, a source node was used to input the data. But when developing an application, if the data is already in memory as a table, it is much easier, and more performant to use arrow::compute::TableSourceNodeOptions. Here the input data can be passed as a std::shared_ptr<arrow::Table> along with a max_batch_size. The max_batch_size is to break up large record batches so that they can be processed in parallel. It is important to note that the table batches will not get merged to form larger batches when the source table has a smaller batch size.

Example of using table_source

/// \brief An example showing a table source node
/// \param exec_context The execution context to run the plan in
///
/// TableSource-Sink Example
/// This example shows how a table_source and sink can be used
/// in an execution plan. This includes a table source node
/// receiving data from a table and the sink node emits
/// the data to a generator which we collect into a table.
arrow::Status TableSourceSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto table, GetTable());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  int max_batch_size = 2;
  auto table_source_options = cp::TableSourceNodeOptions{table, max_batch_size};

  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * source,
      cp::MakeExecNode("table_source", plan.get(), {}, table_source_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {source}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, table->schema(), sink_gen);
}

`filter`#

filter operation, as the name suggests, provides an option to define data filtering criteria. It selects rows matching a given expression. Filters can be written using arrow::compute::Expression. For example, if we wish to keep rows where the value of column b is greater than 3, then we can use the following expression.

Filter example:

/// \brief An example showing a filter node
/// \param exec_context The execution context to run the plan in
///
/// Source-Filter-Sink
/// This example shows how a filter can be used in an execution plan,
/// along with the source and sink operations. The output from the
/// exeuction plan is obtained as a table via the sink node.
arrow::Status ScanFilterSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // specify the filter.  This filter removes all rows where the
  // value of the "a" column is greater than 3.
  cp::Expression filter_opt = cp::greater(cp::field_ref("a"), cp::literal(3));
  // set filter for scanner : on-disk / push-down filtering.
  // This step can be skipped if you are not reading from disk.
  options->filter = filter_opt;
  // empty projection
  options->projection = cp::project({}, {});

  // construct the scan node
  std::cout << "Initialized Scanning Options" << std::endl;

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};
  std::cout << "Scan node options created" << std::endl;

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  // pipe the scan node into the filter node
  // Need to set the filter in scan node options and filter node options.
  // At scan node it is used for on-disk / push-down filtering.
  // At filter node it is used for in-memory filtering.
  cp::ExecNode* filter;
  ARROW_ASSIGN_OR_RAISE(filter, cp::MakeExecNode("filter", plan.get(), {scan},
                                                 cp::FilterNodeOptions{filter_opt}));

  // finally, pipe the filter node into a sink node
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {filter}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, dataset->schema(), sink_gen);
}

`project`#

project operation rearranges, deletes, transforms, and creates columns. Each output column is computed by evaluating an expression against the source record batch. This is exposed via arrow::compute::ProjectNodeOptions which requires, an arrow::compute::Expression and name for each of the output columns (if names are not provided, the string representations of exprs will be used).

Project example:

/// \brief An example showing a project node
/// \param exec_context The execution context to run the plan in
///
/// Scan-Project-Sink
/// This example shows how Scan operation can be used to load the data
/// into the execution plan, how project operation can be applied on the
/// data stream and how the output is obtained as a table via the sink node.
arrow::Status ScanProjectSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // projection
  cp::Expression a_times_2 = cp::call("multiply", {cp::field_ref("a"), cp::literal(2)});
  options->projection = cp::project({}, {});

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  cp::ExecNode* project;
  ARROW_ASSIGN_OR_RAISE(project, cp::MakeExecNode("project", plan.get(), {scan},
                                                  cp::ProjectNodeOptions{{a_times_2}}));
  // schema after projection => multiply(a, 2): int64
  std::cout << "Schema after projection : \n"
            << project->output_schema()->ToString() << std::endl;

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {project}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({arrow::field("a * 2", arrow::int32())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

`aggregate`#

The aggregate node computes various types of aggregates over data.

Arrow supports two types of aggregates: “scalar” aggregates, and “hash” aggregates. Scalar aggregates reduce an array or scalar input to a single scalar output (e.g. computing the mean of a column). Hash aggregates act like GROUP BY in SQL and first partition data based on one or more key columns, then reduce the data in each partition. The aggregate node supports both types of computation, and can compute any number of aggregations at once.

arrow::compute::AggregateNodeOptions is used to define the aggregation criteria. It takes a list of aggregation functions and their options; a list of target fields to aggregate, one per function; and a list of names for the output fields, one per function. Optionally, it takes a list of columns that are used to partition the data, in the case of a hash aggregation. The aggregation functions can be selected from this list of aggregation functions.

Note

This node is a “pipeline breaker” and will fully materialize the dataset in memory. In the future, spillover mechanisms will be added which should alleviate this constraint.

The aggregation can provide results as a group or scalar. For instances, an operation like hash_count provides the counts per each unique record as a grouped result while an operation like sum provides a single record.

Scalar Aggregation example:

/// \brief An example showing an aggregation node to aggregate an entire table
/// \param exec_context The execution context to run the plan in
///
/// Source-Aggregation-Sink
/// This example shows how an aggregation operation can be applied on a
/// execution plan resulting a scalar output. The source node loads the
/// data and the aggregation (counting unique types in column 'a')
/// is applied on this data. The output is obtained from the sink node as a table.
arrow::Status SourceScalarAggregateSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));
  auto aggregate_options = cp::AggregateNodeOptions{/*aggregates=*/{{"sum", nullptr}},
                                                    /*targets=*/{"a"},
                                                    /*names=*/{"sum(a)"}};
  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * aggregate,
      cp::MakeExecNode("aggregate", plan.get(), {source}, aggregate_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {aggregate}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({arrow::field("sum(a)", arrow::int32())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

Group Aggregation example:

/// \brief An example showing an aggregation node to perform a group-by operation
/// \param exec_context The execution context to run the plan in
///
/// Source-Aggregation-Sink
/// This example shows how an aggregation operation can be applied on a
/// execution plan resulting a grouped output. The source node loads the
/// data and the aggregation (counting unique types in column 'a') is
/// applied on this data. The output is obtained from the sink node as a table.
arrow::Status SourceGroupAggregateSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));
  cp::CountOptions options(cp::CountOptions::ONLY_VALID);
  auto aggregate_options =
      cp::AggregateNodeOptions{/*aggregates=*/{{"hash_count", &options}},
                               /*targets=*/{"a"},
                               /*names=*/{"count(a)"},
                               /*keys=*/{"b"}};
  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * aggregate,
      cp::MakeExecNode("aggregate", plan.get(), {source}, aggregate_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {aggregate}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({
      arrow::field("count(a)", arrow::int32()),
      arrow::field("b", arrow::boolean()),
  });

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

`sink`#

sink operation provides output and is the final node of a streaming execution definition. arrow::compute::SinkNodeOptions interface is used to pass the required options. Similar to the source operator the sink operator exposes the output with a function that returns a record batch future each time it is called. It is expected the caller will repeatedly call this function until the generator function is exhausted (returns arrow::util::optional::nullopt). If this function is not called often enough then record batches will accumulate in memory. An execution plan should only have one “terminal” node (one sink node). An ExecPlan can terminate early due to cancellation or an error, before the output is fully consumed. However, the plan can be safely destroyed independently of the sink, which will hold the unconsumed batches by exec_plan->finished().

As a part of the Source Example, the Sink operation is also included;

/// \brief An example demonstrating a source and sink node
/// \param exec_context The execution context to run the plan in
///
/// Source-Sink Example
/// This example shows how a source and sink can be used
/// in an execution plan. This includes source node receiving data
/// and the sink node emits the data as an output represented in
/// a table.
arrow::Status SourceSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {source}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

`consuming_sink`#

consuming_sink operator is a sink operation containing consuming operation within the execution plan (i.e. the exec plan should not complete until the consumption has completed). Unlike the sink node this node takes in a callback function that is expected to consume the batch. Once this callback has finished the execution plan will no longer hold any reference to the batch. The consuming function may be called before a previous invocation has completed. If the consuming function does not run quickly enough then many concurrent executions could pile up, blocking the CPU thread pool. The execution plan will not be marked finished until all consuming function callbacks have been completed. Once all batches have been delivered the execution plan will wait for the finish future to complete before marking the execution plan finished. This allows for workflows where the consumption function converts batches into async tasks (this is currently done internally for the dataset write node).

Example:

// define a Custom SinkNodeConsumer
std::atomic<uint32_t> batches_seen{0};
arrow::Future<> finish = arrow::Future<>::Make();
struct CustomSinkNodeConsumer : public cp::SinkNodeConsumer {

    CustomSinkNodeConsumer(std::atomic<uint32_t> *batches_seen, arrow::Future<>finish):
    batches_seen(batches_seen), finish(std::move(finish)) {}
    // Consumption logic can be written here
    arrow::Status Consume(cp::ExecBatch batch) override {
    // data can be consumed in the expected way
    // transfer to another system or just do some work
    // and write to disk
    (*batches_seen)++;
    return arrow::Status::OK();
    }

    arrow::Future<> Finish() override { return finish; }

    std::atomic<uint32_t> *batches_seen;
    arrow::Future<> finish;

};

std::shared_ptr<CustomSinkNodeConsumer> consumer =
        std::make_shared<CustomSinkNodeConsumer>(&batches_seen, finish);

arrow::compute::ExecNode *consuming_sink;

ARROW_ASSIGN_OR_RAISE(consuming_sink, MakeExecNode("consuming_sink", plan.get(),
    {source}, cp::ConsumingSinkNodeOptions(consumer)));

Consuming-Sink example:

/// \brief An example showing a consuming sink node
/// \param exec_context The execution context to run the plan in
///
/// Source-Consuming-Sink
/// This example shows how the data can be consumed within the execution plan
/// by using a ConsumingSink node. There is no data output from this execution plan.
arrow::Status SourceConsumingSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  std::atomic<uint32_t> batches_seen{0};
  arrow::Future<> finish = arrow::Future<>::Make();
  struct CustomSinkNodeConsumer : public cp::SinkNodeConsumer {
    CustomSinkNodeConsumer(std::atomic<uint32_t>* batches_seen, arrow::Future<> finish)
        : batches_seen(batches_seen), finish(std::move(finish)) {}

    arrow::Status Init(const std::shared_ptr<arrow::Schema>& schema,
                       cp::BackpressureControl* backpressure_control) override {
      return arrow::Status::OK();
    }

    arrow::Status Consume(cp::ExecBatch batch) override {
      (*batches_seen)++;
      return arrow::Status::OK();
    }

    arrow::Future<> Finish() override { return finish; }

    std::atomic<uint32_t>* batches_seen;
    arrow::Future<> finish;
  };
  std::shared_ptr<CustomSinkNodeConsumer> consumer =
      std::make_shared<CustomSinkNodeConsumer>(&batches_seen, finish);

  cp::ExecNode* consuming_sink;

  ARROW_ASSIGN_OR_RAISE(consuming_sink,
                        MakeExecNode("consuming_sink", plan.get(), {source},
                                     cp::ConsumingSinkNodeOptions(consumer)));

  ARROW_RETURN_NOT_OK(consuming_sink->Validate());

  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "Exec Plan created: " << plan->ToString() << std::endl;
  // plan start producing
  ARROW_RETURN_NOT_OK(plan->StartProducing());
  // Source should finish fairly quickly
  ARROW_RETURN_NOT_OK(source->finished().status());
  std::cout << "Source Finished!" << std::endl;
  // Mark consumption complete, plan should finish
  finish.MarkFinished(arrow::Status::OK());
  ARROW_RETURN_NOT_OK(plan->finished().status());
  return arrow::Status::OK();
}

`order_by_sink`#

order_by_sink operation is an extension to the sink operation. This operation provides the ability to guarantee the ordering of the stream by providing the arrow::compute::OrderBySinkNodeOptions. Here the arrow::compute::SortOptions are provided to define which columns are used for sorting and whether to sort by ascending or descending values.

Note

This node is a “pipeline breaker” and will fully materialize the dataset in memory. In the future, spillover mechanisms will be added which should alleviate this constraint.

Order-By-Sink example:

/// \brief An example showing an order-by node
/// \param exec_context The execution context to run the plan in
///
/// Source-OrderBy-Sink
/// In this example, the data enters through the source node
/// and the data is ordered in the sink node. The order can be
/// ASCENDING or DESCENDING and it is configurable. The output
/// is obtained as a table from the sink node.
arrow::Status SourceOrderBySinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeSortTestBasicBatches());

  std::cout << "basic data created" << std::endl;

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};
  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  ARROW_RETURN_NOT_OK(cp::MakeExecNode(
      "order_by_sink", plan.get(), {source},
      cp::OrderBySinkNodeOptions{
          cp::SortOptions{{cp::SortKey{"a", cp::SortOrder::Descending}}}, &sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

`select_k_sink`#

select_k_sink option enables selecting the top/bottom K elements, similar to a SQL ORDER BY ... LIMIT K clause. arrow::compute::SelectKOptions which is a defined by using OrderBySinkNode definition. This option returns a sink node that receives inputs and then compute top_k/bottom_k.

Note

This node is a “pipeline breaker” and will fully materialize the input in memory. In the future, spillover mechanisms will be added which should alleviate this constraint.

SelectK example:

/// \brief An example showing a select-k node
/// \param exec_context The execution context to run the plan in
///
/// Source-KSelect
/// This example shows how K number of elements can be selected
/// either from the top or bottom. The output node is a modified
/// sink node where output can be obtained as a table.
arrow::Status SourceKSelectExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto input, MakeGroupableBatches());
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * source,
      cp::MakeExecNode("source", plan.get(), {},
                       cp::SourceNodeOptions{input.schema, input.gen()}));

  cp::SelectKOptions options = cp::SelectKOptions::TopKDefault(/*k=*/2, {"i32"});

  ARROW_RETURN_NOT_OK(cp::MakeExecNode("select_k_sink", plan.get(), {source},
                                       cp::SelectKSinkNodeOptions{options, &sink_gen}));

  auto schema = arrow::schema(
      {arrow::field("i32", arrow::int32()), arrow::field("str", arrow::utf8())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

`table_sink`#

The table_sink node provides the ability to receive the output as an in-memory table. This is simpler to use than the other sink nodes provided by the streaming execution engine but it only makes sense when the output fits comfortably in memory. The node is created using arrow::compute::TableSinkNodeOptions.

Example of using table_sink

/// \brief An example showing a table sink node
/// \param exec_context The execution context to run the plan in
///
/// TableSink Example
/// This example shows how a table_sink can be used
/// in an execution plan. This includes a source node
/// receiving data as batches and the table sink node
/// which emits the output as a table.
arrow::Status TableSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  std::shared_ptr<arrow::Table> output_table;
  auto table_sink_options = cp::TableSinkNodeOptions{&output_table};

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("table_sink", plan.get(), {source}, table_sink_options));
  // validate the ExecPlan
  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "ExecPlan created : " << plan->ToString() << std::endl;
  // start the ExecPlan
  ARROW_RETURN_NOT_OK(plan->StartProducing());

  // Wait for the plan to finish
  auto finished = plan->finished();
  RETURN_NOT_OK(finished.status());
  std::cout << "Results : " << output_table->ToString() << std::endl;
  return arrow::Status::OK();
}

`scan`#

scan is an operation used to load and process datasets. It should be preferred over the more generic source node when your input is a dataset. The behavior is defined using arrow::dataset::ScanNodeOptions. More information on datasets and the various scan options can be found in Tabular Datasets.

This node is capable of applying pushdown filters to the file readers which reduce the amount of data that needs to be read. This means you may supply the same filter expression to the scan node that you also supply to the FilterNode because the filtering is done in two different places.

Scan example:

/// \brief An example demonstrating a scan and sink node
/// \param exec_context The execution context to run the plan in
///
/// Scan-Sink
/// This example shows how scan operation can be applied on a dataset.
/// There are operations that can be applied on the scan (project, filter)
/// and the input data can be processed. The output is obtained as a table
/// via the sink node.
arrow::Status ScanSinkExample(cp::ExecContext& exec_context) {
  // Execution plan created
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  options->projection = cp::project({}, {});  // create empty projection

  // construct the scan node
  cp::ExecNode* scan;
  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {scan}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, dataset->schema(), sink_gen);
}

`write`#

The write node saves query results as a dataset of files in a format like Parquet, Feather, CSV, etc. using the Tabular Datasets functionality in Arrow. The write options are provided via the arrow::dataset::WriteNodeOptions which in turn contains arrow::dataset::FileSystemDatasetWriteOptions. arrow::dataset::FileSystemDatasetWriteOptions provides control over the written dataset, including options like the output directory, file naming scheme, and so on.

Write example:

/// \brief An example showing a write node
/// \param exec_context The execution context to run the plan in
/// \param file_path The destination to write to
///
/// Scan-Filter-Write
/// This example shows how scan node can be used to load the data
/// and after processing how it can be written to disk.
arrow::Status ScanFilterWriteExample(cp::ExecContext& exec_context,
                                     const std::string& file_path) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // empty projection
  options->projection = cp::project({}, {});

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  std::string root_path = "";
  std::string uri = "file://" + file_path;
  std::shared_ptr<arrow::fs::FileSystem> filesystem =
      arrow::fs::FileSystemFromUri(uri, &root_path).ValueOrDie();

  auto base_path = root_path + "/parquet_dataset";
  // Uncomment the following line, if run repeatedly
  // ARROW_RETURN_NOT_OK(filesystem->DeleteDirContents(base_path));
  ARROW_RETURN_NOT_OK(filesystem->CreateDir(base_path));

  // The partition schema determines which fields are part of the partitioning.
  auto partition_schema = arrow::schema({arrow::field("a", arrow::int32())});
  // We'll use Hive-style partitioning,
  // which creates directories with "key=value" pairs.

  auto partitioning =
      std::make_shared<arrow::dataset::HivePartitioning>(partition_schema);
  // We'll write Parquet files.
  auto format = std::make_shared<arrow::dataset::ParquetFileFormat>();

  arrow::dataset::FileSystemDatasetWriteOptions write_options;
  write_options.file_write_options = format->DefaultWriteOptions();
  write_options.filesystem = filesystem;
  write_options.base_dir = base_path;
  write_options.partitioning = partitioning;
  write_options.basename_template = "part{i}.parquet";

  arrow::dataset::WriteNodeOptions write_node_options{write_options};

  ARROW_RETURN_NOT_OK(cp::MakeExecNode("write", plan.get(), {scan}, write_node_options));

  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "Execution Plan Created : " << plan->ToString() << std::endl;
  // // // start the ExecPlan
  ARROW_RETURN_NOT_OK(plan->StartProducing());
  auto future = plan->finished();
  ARROW_RETURN_NOT_OK(future.status());
  future.Wait();
  return arrow::Status::OK();
}

`union`#

union merges multiple data streams with the same schema into one, similar to a SQL UNION ALL clause.

The following example demonstrates how this can be achieved using two data sources.

Union example:

/// \brief An example showing a union node
/// \param exec_context The execution context to run the plan in
///
/// Source-Union-Sink
/// This example shows how a union operation can be applied on two
/// data sources. The output is obtained as a table via the sink
/// node.
arrow::Status SourceUnionSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  std::shared_ptr<cp::ExecPlan> plan = cp::ExecPlan::Make(&exec_context).ValueOrDie();
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  cp::Declaration union_node{"union", cp::ExecNodeOptions{}};
  cp::Declaration lhs{"source",
                      cp::SourceNodeOptions{basic_data.schema, basic_data.gen()}};
  lhs.label = "lhs";
  cp::Declaration rhs{"source",
                      cp::SourceNodeOptions{basic_data.schema, basic_data.gen()}};
  rhs.label = "rhs";
  union_node.inputs.emplace_back(lhs);
  union_node.inputs.emplace_back(rhs);

  cp::CountOptions options(cp::CountOptions::ONLY_VALID);
  ARROW_ASSIGN_OR_RAISE(
      auto declr, cp::Declaration::Sequence({
                                                union_node,
                                                {"sink", cp::SinkNodeOptions{&sink_gen}},
                                            })
                      .AddToPlan(plan.get()));

  ARROW_RETURN_NOT_OK(declr->Validate());

  ARROW_RETURN_NOT_OK(plan->Validate());
  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

`hash_join`#

hash_join operation provides the relational algebra operation, join using hash-based algorithm. arrow::compute::HashJoinNodeOptions contains the options required in defining a join. The hash_join supports left/right/full semi/anti/outerjoins. Also the join-key (i.e. the column(s) to join on), and suffixes (i.e a suffix term like “_x” which can be appended as a suffix for column names duplicated in both left and right relations.) can be set via the the join options. Read more on hash-joins.

Hash-Join example:

/// \brief An example showing a hash join node
/// \param exec_context The execution context to run the plan in
///
/// Source-HashJoin-Sink
/// This example shows how source node gets the data and how a self-join
/// is applied on the data. The join options are configurable. The output
/// is obtained as a table via the sink node.
arrow::Status SourceHashJoinSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto input, MakeGroupableBatches());
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  cp::ExecNode* left_source;
  cp::ExecNode* right_source;
  for (auto source : {&left_source, &right_source}) {
    ARROW_ASSIGN_OR_RAISE(*source,
                          MakeExecNode("source", plan.get(), {},
                                       cp::SourceNodeOptions{input.schema, input.gen()}));
  }

  cp::HashJoinNodeOptions join_opts{
      cp::JoinType::INNER,
      /*left_keys=*/{"str"},
      /*right_keys=*/{"str"}, cp::literal(true), "l_", "r_"};

  ARROW_ASSIGN_OR_RAISE(
      auto hashjoin,
      cp::MakeExecNode("hashjoin", plan.get(), {left_source, right_source}, join_opts));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {hashjoin}, cp::SinkNodeOptions{&sink_gen}));
  // expected columns i32, str, l_str, r_str
  auto schema = arrow::schema(
      {arrow::field("i32", arrow::int32()), arrow::field("str", arrow::utf8()),
       arrow::field("l_str", arrow::utf8()), arrow::field("r_str", arrow::utf8())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

Summary#

There are examples of these nodes which can be found in cpp/examples/arrow/execution_plan_documentation_examples.cc in the Arrow source.

Complete Example:

#include <arrow/array.h>
#include <arrow/builder.h>

#include <arrow/compute/api.h>
#include <arrow/compute/api_vector.h>
#include <arrow/compute/cast.h>
#include <arrow/compute/exec/exec_plan.h>

#include <arrow/csv/api.h>

#include <arrow/dataset/dataset.h>
#include <arrow/dataset/file_base.h>
#include <arrow/dataset/file_parquet.h>
#include <arrow/dataset/plan.h>
#include <arrow/dataset/scanner.h>

#include <arrow/io/interfaces.h>
#include <arrow/io/memory.h>

#include <arrow/result.h>
#include <arrow/status.h>
#include <arrow/table.h>

#include <arrow/ipc/api.h>

#include <arrow/util/future.h>
#include <arrow/util/range.h>
#include <arrow/util/thread_pool.h>
#include <arrow/util/vector.h>

#include <iostream>
#include <memory>
#include <utility>

// Demonstrate various operators in Arrow Streaming Execution Engine

namespace cp = ::arrow::compute;

constexpr char kSep[] = "******";

void PrintBlock(const std::string& msg) {
  std::cout << "\n\t" << kSep << " " << msg << " " << kSep << "\n" << std::endl;
}

template <typename TYPE,
          typename = typename std::enable_if<arrow::is_number_type<TYPE>::value |
                                             arrow::is_boolean_type<TYPE>::value |
                                             arrow::is_temporal_type<TYPE>::value>::type>
arrow::Result<std::shared_ptr<arrow::Array>> GetArrayDataSample(
    const std::vector<typename TYPE::c_type>& values) {
  using ARROW_ARRAY_TYPE = typename arrow::TypeTraits<TYPE>::ArrayType;
  using ARROW_BUILDER_TYPE = typename arrow::TypeTraits<TYPE>::BuilderType;
  ARROW_BUILDER_TYPE builder;
  ARROW_RETURN_NOT_OK(builder.Reserve(values.size()));
  std::shared_ptr<ARROW_ARRAY_TYPE> array;
  ARROW_RETURN_NOT_OK(builder.AppendValues(values));
  ARROW_RETURN_NOT_OK(builder.Finish(&array));
  return array;
}

template <class TYPE>
arrow::Result<std::shared_ptr<arrow::Array>> GetBinaryArrayDataSample(
    const std::vector<std::string>& values) {
  using ARROW_ARRAY_TYPE = typename arrow::TypeTraits<TYPE>::ArrayType;
  using ARROW_BUILDER_TYPE = typename arrow::TypeTraits<TYPE>::BuilderType;
  ARROW_BUILDER_TYPE builder;
  ARROW_RETURN_NOT_OK(builder.Reserve(values.size()));
  std::shared_ptr<ARROW_ARRAY_TYPE> array;
  ARROW_RETURN_NOT_OK(builder.AppendValues(values));
  ARROW_RETURN_NOT_OK(builder.Finish(&array));
  return array;
}

arrow::Result<std::shared_ptr<arrow::RecordBatch>> GetSampleRecordBatch(
    const arrow::ArrayVector array_vector, const arrow::FieldVector& field_vector) {
  std::shared_ptr<arrow::RecordBatch> record_batch;
  ARROW_ASSIGN_OR_RAISE(auto struct_result,
                        arrow::StructArray::Make(array_vector, field_vector));
  return record_batch->FromStructArray(struct_result);
}

/// \brief Create a sample table
/// The table's contents will be:
/// a,b
/// 1,null
/// 2,true
/// null,true
/// 3,false
/// null,true
/// 4,false
/// 5,null
/// 6,false
/// 7,false
/// 8,true
/// \return The created table

arrow::Result<std::shared_ptr<arrow::Table>> GetTable() {
  auto null_long = std::numeric_limits<int64_t>::quiet_NaN();
  ARROW_ASSIGN_OR_RAISE(auto int64_array,
                        GetArrayDataSample<arrow::Int64Type>(
                            {1, 2, null_long, 3, null_long, 4, 5, 6, 7, 8}));

  arrow::BooleanBuilder boolean_builder;
  std::shared_ptr<arrow::BooleanArray> bool_array;

  std::vector<uint8_t> bool_values = {false, true,  true,  false, true,
                                      false, false, false, false, true};
  std::vector<bool> is_valid = {false, true,  true, true, true,
                                true,  false, true, true, true};

  ARROW_RETURN_NOT_OK(boolean_builder.Reserve(10));

  ARROW_RETURN_NOT_OK(boolean_builder.AppendValues(bool_values, is_valid));

  ARROW_RETURN_NOT_OK(boolean_builder.Finish(&bool_array));

  auto record_batch =
      arrow::RecordBatch::Make(arrow::schema({arrow::field("a", arrow::int64()),
                                              arrow::field("b", arrow::boolean())}),
                               10, {int64_array, bool_array});
  ARROW_ASSIGN_OR_RAISE(auto table, arrow::Table::FromRecordBatches({record_batch}));
  return table;
}

/// \brief Create a sample dataset
/// \return An in-memory dataset based on GetTable()
arrow::Result<std::shared_ptr<arrow::dataset::Dataset>> GetDataset() {
  ARROW_ASSIGN_OR_RAISE(auto table, GetTable());
  auto ds = std::make_shared<arrow::dataset::InMemoryDataset>(table);
  return ds;
}

arrow::Result<cp::ExecBatch> GetExecBatchFromVectors(
    const arrow::FieldVector& field_vector, const arrow::ArrayVector& array_vector) {
  std::shared_ptr<arrow::RecordBatch> record_batch;
  ARROW_ASSIGN_OR_RAISE(auto res_batch, GetSampleRecordBatch(array_vector, field_vector));
  cp::ExecBatch batch{*res_batch};
  return batch;
}

// (Doc section: BatchesWithSchema Definition)
struct BatchesWithSchema {
  std::vector<cp::ExecBatch> batches;
  std::shared_ptr<arrow::Schema> schema;
  // This method uses internal arrow utilities to
  // convert a vector of record batches to an AsyncGenerator of optional batches
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> gen() const {
    auto opt_batches = ::arrow::internal::MapVector(
        [](cp::ExecBatch batch) { return arrow::util::make_optional(std::move(batch)); },
        batches);
    arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> gen;
    gen = arrow::MakeVectorGenerator(std::move(opt_batches));
    return gen;
  }
};
// (Doc section: BatchesWithSchema Definition)

// (Doc section: MakeBasicBatches Definition)
arrow::Result<BatchesWithSchema> MakeBasicBatches() {
  BatchesWithSchema out;
  auto field_vector = {arrow::field("a", arrow::int32()),
                       arrow::field("b", arrow::boolean())};
  ARROW_ASSIGN_OR_RAISE(auto b1_int, GetArrayDataSample<arrow::Int32Type>({0, 4}));
  ARROW_ASSIGN_OR_RAISE(auto b2_int, GetArrayDataSample<arrow::Int32Type>({5, 6, 7}));
  ARROW_ASSIGN_OR_RAISE(auto b3_int, GetArrayDataSample<arrow::Int32Type>({8, 9, 10}));

  ARROW_ASSIGN_OR_RAISE(auto b1_bool,
                        GetArrayDataSample<arrow::BooleanType>({false, true}));
  ARROW_ASSIGN_OR_RAISE(auto b2_bool,
                        GetArrayDataSample<arrow::BooleanType>({true, false, true}));
  ARROW_ASSIGN_OR_RAISE(auto b3_bool,
                        GetArrayDataSample<arrow::BooleanType>({false, true, false}));

  ARROW_ASSIGN_OR_RAISE(auto b1,
                        GetExecBatchFromVectors(field_vector, {b1_int, b1_bool}));
  ARROW_ASSIGN_OR_RAISE(auto b2,
                        GetExecBatchFromVectors(field_vector, {b2_int, b2_bool}));
  ARROW_ASSIGN_OR_RAISE(auto b3,
                        GetExecBatchFromVectors(field_vector, {b3_int, b3_bool}));

  out.batches = {b1, b2, b3};
  out.schema = arrow::schema(field_vector);
  return out;
}
// (Doc section: MakeBasicBatches Definition)

arrow::Result<BatchesWithSchema> MakeSortTestBasicBatches() {
  BatchesWithSchema out;
  auto field = arrow::field("a", arrow::int32());
  ARROW_ASSIGN_OR_RAISE(auto b1_int, GetArrayDataSample<arrow::Int32Type>({1, 3, 0, 2}));
  ARROW_ASSIGN_OR_RAISE(auto b2_int,
                        GetArrayDataSample<arrow::Int32Type>({121, 101, 120, 12}));
  ARROW_ASSIGN_OR_RAISE(auto b3_int,
                        GetArrayDataSample<arrow::Int32Type>({10, 110, 210, 121}));
  ARROW_ASSIGN_OR_RAISE(auto b4_int,
                        GetArrayDataSample<arrow::Int32Type>({51, 101, 2, 34}));
  ARROW_ASSIGN_OR_RAISE(auto b5_int,
                        GetArrayDataSample<arrow::Int32Type>({11, 31, 1, 12}));
  ARROW_ASSIGN_OR_RAISE(auto b6_int,
                        GetArrayDataSample<arrow::Int32Type>({12, 101, 120, 12}));
  ARROW_ASSIGN_OR_RAISE(auto b7_int,
                        GetArrayDataSample<arrow::Int32Type>({0, 110, 210, 11}));
  ARROW_ASSIGN_OR_RAISE(auto b8_int,
                        GetArrayDataSample<arrow::Int32Type>({51, 10, 2, 3}));

  ARROW_ASSIGN_OR_RAISE(auto b1, GetExecBatchFromVectors({field}, {b1_int}));
  ARROW_ASSIGN_OR_RAISE(auto b2, GetExecBatchFromVectors({field}, {b2_int}));
  ARROW_ASSIGN_OR_RAISE(auto b3,
                        GetExecBatchFromVectors({field, field}, {b3_int, b8_int}));
  ARROW_ASSIGN_OR_RAISE(auto b4,
                        GetExecBatchFromVectors({field, field, field, field},
                                                {b4_int, b5_int, b6_int, b7_int}));
  out.batches = {b1, b2, b3, b4};
  out.schema = arrow::schema({field});
  return out;
}

arrow::Result<BatchesWithSchema> MakeGroupableBatches(int multiplicity = 1) {
  BatchesWithSchema out;
  auto fields = {arrow::field("i32", arrow::int32()), arrow::field("str", arrow::utf8())};
  ARROW_ASSIGN_OR_RAISE(auto b1_int, GetArrayDataSample<arrow::Int32Type>({12, 7, 3}));
  ARROW_ASSIGN_OR_RAISE(auto b2_int, GetArrayDataSample<arrow::Int32Type>({-2, -1, 3}));
  ARROW_ASSIGN_OR_RAISE(auto b3_int, GetArrayDataSample<arrow::Int32Type>({5, 3, -8}));
  ARROW_ASSIGN_OR_RAISE(auto b1_str, GetBinaryArrayDataSample<arrow::StringType>(
                                         {"alpha", "beta", "alpha"}));
  ARROW_ASSIGN_OR_RAISE(auto b2_str, GetBinaryArrayDataSample<arrow::StringType>(
                                         {"alpha", "gamma", "alpha"}));
  ARROW_ASSIGN_OR_RAISE(auto b3_str, GetBinaryArrayDataSample<arrow::StringType>(
                                         {"gamma", "beta", "alpha"}));
  ARROW_ASSIGN_OR_RAISE(auto b1, GetExecBatchFromVectors(fields, {b1_int, b1_str}));
  ARROW_ASSIGN_OR_RAISE(auto b2, GetExecBatchFromVectors(fields, {b2_int, b2_str}));
  ARROW_ASSIGN_OR_RAISE(auto b3, GetExecBatchFromVectors(fields, {b3_int, b3_str}));
  out.batches = {b1, b2, b3};

  size_t batch_count = out.batches.size();
  for (int repeat = 1; repeat < multiplicity; ++repeat) {
    for (size_t i = 0; i < batch_count; ++i) {
      out.batches.push_back(out.batches[i]);
    }
  }

  out.schema = arrow::schema(fields);
  return out;
}

arrow::Status ExecutePlanAndCollectAsTable(
    cp::ExecContext& exec_context, std::shared_ptr<cp::ExecPlan> plan,
    std::shared_ptr<arrow::Schema> schema,
    arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen) {
  // translate sink_gen (async) to sink_reader (sync)
  std::shared_ptr<arrow::RecordBatchReader> sink_reader =
      cp::MakeGeneratorReader(schema, std::move(sink_gen), exec_context.memory_pool());

  // validate the ExecPlan
  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "ExecPlan created : " << plan->ToString() << std::endl;
  // start the ExecPlan
  ARROW_RETURN_NOT_OK(plan->StartProducing());

  // collect sink_reader into a Table
  std::shared_ptr<arrow::Table> response_table;

  ARROW_ASSIGN_OR_RAISE(response_table,
                        arrow::Table::FromRecordBatchReader(sink_reader.get()));

  std::cout << "Results : " << response_table->ToString() << std::endl;

  // stop producing
  plan->StopProducing();
  // plan mark finished
  auto future = plan->finished();
  return future.status();
}

// (Doc section: Scan Example)

/// \brief An example demonstrating a scan and sink node
/// \param exec_context The execution context to run the plan in
///
/// Scan-Sink
/// This example shows how scan operation can be applied on a dataset.
/// There are operations that can be applied on the scan (project, filter)
/// and the input data can be processed. The output is obtained as a table
/// via the sink node.
arrow::Status ScanSinkExample(cp::ExecContext& exec_context) {
  // Execution plan created
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  options->projection = cp::project({}, {});  // create empty projection

  // construct the scan node
  cp::ExecNode* scan;
  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {scan}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, dataset->schema(), sink_gen);
}
// (Doc section: Scan Example)

// (Doc section: Source Example)

/// \brief An example demonstrating a source and sink node
/// \param exec_context The execution context to run the plan in
///
/// Source-Sink Example
/// This example shows how a source and sink can be used
/// in an execution plan. This includes source node receiving data
/// and the sink node emits the data as an output represented in
/// a table.
arrow::Status SourceSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {source}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}
// (Doc section: Source Example)

// (Doc section: Table Source Example)

/// \brief An example showing a table source node
/// \param exec_context The execution context to run the plan in
///
/// TableSource-Sink Example
/// This example shows how a table_source and sink can be used
/// in an execution plan. This includes a table source node
/// receiving data from a table and the sink node emits
/// the data to a generator which we collect into a table.
arrow::Status TableSourceSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto table, GetTable());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  int max_batch_size = 2;
  auto table_source_options = cp::TableSourceNodeOptions{table, max_batch_size};

  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * source,
      cp::MakeExecNode("table_source", plan.get(), {}, table_source_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {source}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, table->schema(), sink_gen);
}
// (Doc section: Table Source Example)

// (Doc section: Filter Example)

/// \brief An example showing a filter node
/// \param exec_context The execution context to run the plan in
///
/// Source-Filter-Sink
/// This example shows how a filter can be used in an execution plan,
/// along with the source and sink operations. The output from the
/// exeuction plan is obtained as a table via the sink node.
arrow::Status ScanFilterSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // specify the filter.  This filter removes all rows where the
  // value of the "a" column is greater than 3.
  cp::Expression filter_opt = cp::greater(cp::field_ref("a"), cp::literal(3));
  // set filter for scanner : on-disk / push-down filtering.
  // This step can be skipped if you are not reading from disk.
  options->filter = filter_opt;
  // empty projection
  options->projection = cp::project({}, {});

  // construct the scan node
  std::cout << "Initialized Scanning Options" << std::endl;

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};
  std::cout << "Scan node options created" << std::endl;

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  // pipe the scan node into the filter node
  // Need to set the filter in scan node options and filter node options.
  // At scan node it is used for on-disk / push-down filtering.
  // At filter node it is used for in-memory filtering.
  cp::ExecNode* filter;
  ARROW_ASSIGN_OR_RAISE(filter, cp::MakeExecNode("filter", plan.get(), {scan},
                                                 cp::FilterNodeOptions{filter_opt}));

  // finally, pipe the filter node into a sink node
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {filter}, cp::SinkNodeOptions{&sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, dataset->schema(), sink_gen);
}

// (Doc section: Filter Example)

// (Doc section: Project Example)

/// \brief An example showing a project node
/// \param exec_context The execution context to run the plan in
///
/// Scan-Project-Sink
/// This example shows how Scan operation can be used to load the data
/// into the execution plan, how project operation can be applied on the
/// data stream and how the output is obtained as a table via the sink node.
arrow::Status ScanProjectSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // projection
  cp::Expression a_times_2 = cp::call("multiply", {cp::field_ref("a"), cp::literal(2)});
  options->projection = cp::project({}, {});

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  cp::ExecNode* project;
  ARROW_ASSIGN_OR_RAISE(project, cp::MakeExecNode("project", plan.get(), {scan},
                                                  cp::ProjectNodeOptions{{a_times_2}}));
  // schema after projection => multiply(a, 2): int64
  std::cout << "Schema after projection : \n"
            << project->output_schema()->ToString() << std::endl;

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;
  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {project}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({arrow::field("a * 2", arrow::int32())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

// (Doc section: Project Example)

// (Doc section: Scalar Aggregate Example)

/// \brief An example showing an aggregation node to aggregate an entire table
/// \param exec_context The execution context to run the plan in
///
/// Source-Aggregation-Sink
/// This example shows how an aggregation operation can be applied on a
/// execution plan resulting a scalar output. The source node loads the
/// data and the aggregation (counting unique types in column 'a')
/// is applied on this data. The output is obtained from the sink node as a table.
arrow::Status SourceScalarAggregateSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));
  auto aggregate_options = cp::AggregateNodeOptions{/*aggregates=*/{{"sum", nullptr}},
                                                    /*targets=*/{"a"},
                                                    /*names=*/{"sum(a)"}};
  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * aggregate,
      cp::MakeExecNode("aggregate", plan.get(), {source}, aggregate_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {aggregate}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({arrow::field("sum(a)", arrow::int32())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}
// (Doc section: Scalar Aggregate Example)

// (Doc section: Group Aggregate Example)

/// \brief An example showing an aggregation node to perform a group-by operation
/// \param exec_context The execution context to run the plan in
///
/// Source-Aggregation-Sink
/// This example shows how an aggregation operation can be applied on a
/// execution plan resulting a grouped output. The source node loads the
/// data and the aggregation (counting unique types in column 'a') is
/// applied on this data. The output is obtained from the sink node as a table.
arrow::Status SourceGroupAggregateSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));
  cp::CountOptions options(cp::CountOptions::ONLY_VALID);
  auto aggregate_options =
      cp::AggregateNodeOptions{/*aggregates=*/{{"hash_count", &options}},
                               /*targets=*/{"a"},
                               /*names=*/{"count(a)"},
                               /*keys=*/{"b"}};
  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * aggregate,
      cp::MakeExecNode("aggregate", plan.get(), {source}, aggregate_options));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {aggregate}, cp::SinkNodeOptions{&sink_gen}));
  auto schema = arrow::schema({
      arrow::field("count(a)", arrow::int32()),
      arrow::field("b", arrow::boolean()),
  });

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}
// (Doc section: Group Aggregate Example)

// (Doc section: ConsumingSink Example)

/// \brief An example showing a consuming sink node
/// \param exec_context The execution context to run the plan in
///
/// Source-Consuming-Sink
/// This example shows how the data can be consumed within the execution plan
/// by using a ConsumingSink node. There is no data output from this execution plan.
arrow::Status SourceConsumingSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  std::atomic<uint32_t> batches_seen{0};
  arrow::Future<> finish = arrow::Future<>::Make();
  struct CustomSinkNodeConsumer : public cp::SinkNodeConsumer {
    CustomSinkNodeConsumer(std::atomic<uint32_t>* batches_seen, arrow::Future<> finish)
        : batches_seen(batches_seen), finish(std::move(finish)) {}

    arrow::Status Init(const std::shared_ptr<arrow::Schema>& schema,
                       cp::BackpressureControl* backpressure_control) override {
      return arrow::Status::OK();
    }

    arrow::Status Consume(cp::ExecBatch batch) override {
      (*batches_seen)++;
      return arrow::Status::OK();
    }

    arrow::Future<> Finish() override { return finish; }

    std::atomic<uint32_t>* batches_seen;
    arrow::Future<> finish;
  };
  std::shared_ptr<CustomSinkNodeConsumer> consumer =
      std::make_shared<CustomSinkNodeConsumer>(&batches_seen, finish);

  cp::ExecNode* consuming_sink;

  ARROW_ASSIGN_OR_RAISE(consuming_sink,
                        MakeExecNode("consuming_sink", plan.get(), {source},
                                     cp::ConsumingSinkNodeOptions(consumer)));

  ARROW_RETURN_NOT_OK(consuming_sink->Validate());

  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "Exec Plan created: " << plan->ToString() << std::endl;
  // plan start producing
  ARROW_RETURN_NOT_OK(plan->StartProducing());
  // Source should finish fairly quickly
  ARROW_RETURN_NOT_OK(source->finished().status());
  std::cout << "Source Finished!" << std::endl;
  // Mark consumption complete, plan should finish
  finish.MarkFinished(arrow::Status::OK());
  ARROW_RETURN_NOT_OK(plan->finished().status());
  return arrow::Status::OK();
}
// (Doc section: ConsumingSink Example)

// (Doc section: OrderBySink Example)

/// \brief An example showing an order-by node
/// \param exec_context The execution context to run the plan in
///
/// Source-OrderBy-Sink
/// In this example, the data enters through the source node
/// and the data is ordered in the sink node. The order can be
/// ASCENDING or DESCENDING and it is configurable. The output
/// is obtained as a table from the sink node.
arrow::Status SourceOrderBySinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeSortTestBasicBatches());

  std::cout << "basic data created" << std::endl;

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};
  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  ARROW_RETURN_NOT_OK(cp::MakeExecNode(
      "order_by_sink", plan.get(), {source},
      cp::OrderBySinkNodeOptions{
          cp::SortOptions{{cp::SortKey{"a", cp::SortOrder::Descending}}}, &sink_gen}));

  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

// (Doc section: OrderBySink Example)

// (Doc section: HashJoin Example)

/// \brief An example showing a hash join node
/// \param exec_context The execution context to run the plan in
///
/// Source-HashJoin-Sink
/// This example shows how source node gets the data and how a self-join
/// is applied on the data. The join options are configurable. The output
/// is obtained as a table via the sink node.
arrow::Status SourceHashJoinSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto input, MakeGroupableBatches());
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  cp::ExecNode* left_source;
  cp::ExecNode* right_source;
  for (auto source : {&left_source, &right_source}) {
    ARROW_ASSIGN_OR_RAISE(*source,
                          MakeExecNode("source", plan.get(), {},
                                       cp::SourceNodeOptions{input.schema, input.gen()}));
  }

  cp::HashJoinNodeOptions join_opts{
      cp::JoinType::INNER,
      /*left_keys=*/{"str"},
      /*right_keys=*/{"str"}, cp::literal(true), "l_", "r_"};

  ARROW_ASSIGN_OR_RAISE(
      auto hashjoin,
      cp::MakeExecNode("hashjoin", plan.get(), {left_source, right_source}, join_opts));

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("sink", plan.get(), {hashjoin}, cp::SinkNodeOptions{&sink_gen}));
  // expected columns i32, str, l_str, r_str
  auto schema = arrow::schema(
      {arrow::field("i32", arrow::int32()), arrow::field("str", arrow::utf8()),
       arrow::field("l_str", arrow::utf8()), arrow::field("r_str", arrow::utf8())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

// (Doc section: HashJoin Example)

// (Doc section: KSelect Example)

/// \brief An example showing a select-k node
/// \param exec_context The execution context to run the plan in
///
/// Source-KSelect
/// This example shows how K number of elements can be selected
/// either from the top or bottom. The output node is a modified
/// sink node where output can be obtained as a table.
arrow::Status SourceKSelectExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto input, MakeGroupableBatches());
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  ARROW_ASSIGN_OR_RAISE(
      cp::ExecNode * source,
      cp::MakeExecNode("source", plan.get(), {},
                       cp::SourceNodeOptions{input.schema, input.gen()}));

  cp::SelectKOptions options = cp::SelectKOptions::TopKDefault(/*k=*/2, {"i32"});

  ARROW_RETURN_NOT_OK(cp::MakeExecNode("select_k_sink", plan.get(), {source},
                                       cp::SelectKSinkNodeOptions{options, &sink_gen}));

  auto schema = arrow::schema(
      {arrow::field("i32", arrow::int32()), arrow::field("str", arrow::utf8())});

  return ExecutePlanAndCollectAsTable(exec_context, plan, schema, sink_gen);
}

// (Doc section: KSelect Example)

// (Doc section: Write Example)

/// \brief An example showing a write node
/// \param exec_context The execution context to run the plan in
/// \param file_path The destination to write to
///
/// Scan-Filter-Write
/// This example shows how scan node can be used to load the data
/// and after processing how it can be written to disk.
arrow::Status ScanFilterWriteExample(cp::ExecContext& exec_context,
                                     const std::string& file_path) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::dataset::Dataset> dataset, GetDataset());

  auto options = std::make_shared<arrow::dataset::ScanOptions>();
  // empty projection
  options->projection = cp::project({}, {});

  cp::ExecNode* scan;

  auto scan_node_options = arrow::dataset::ScanNodeOptions{dataset, options};

  ARROW_ASSIGN_OR_RAISE(scan,
                        cp::MakeExecNode("scan", plan.get(), {}, scan_node_options));

  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  std::string root_path = "";
  std::string uri = "file://" + file_path;
  std::shared_ptr<arrow::fs::FileSystem> filesystem =
      arrow::fs::FileSystemFromUri(uri, &root_path).ValueOrDie();

  auto base_path = root_path + "/parquet_dataset";
  // Uncomment the following line, if run repeatedly
  // ARROW_RETURN_NOT_OK(filesystem->DeleteDirContents(base_path));
  ARROW_RETURN_NOT_OK(filesystem->CreateDir(base_path));

  // The partition schema determines which fields are part of the partitioning.
  auto partition_schema = arrow::schema({arrow::field("a", arrow::int32())});
  // We'll use Hive-style partitioning,
  // which creates directories with "key=value" pairs.

  auto partitioning =
      std::make_shared<arrow::dataset::HivePartitioning>(partition_schema);
  // We'll write Parquet files.
  auto format = std::make_shared<arrow::dataset::ParquetFileFormat>();

  arrow::dataset::FileSystemDatasetWriteOptions write_options;
  write_options.file_write_options = format->DefaultWriteOptions();
  write_options.filesystem = filesystem;
  write_options.base_dir = base_path;
  write_options.partitioning = partitioning;
  write_options.basename_template = "part{i}.parquet";

  arrow::dataset::WriteNodeOptions write_node_options{write_options};

  ARROW_RETURN_NOT_OK(cp::MakeExecNode("write", plan.get(), {scan}, write_node_options));

  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "Execution Plan Created : " << plan->ToString() << std::endl;
  // // // start the ExecPlan
  ARROW_RETURN_NOT_OK(plan->StartProducing());
  auto future = plan->finished();
  ARROW_RETURN_NOT_OK(future.status());
  future.Wait();
  return arrow::Status::OK();
}

// (Doc section: Write Example)

// (Doc section: Union Example)

/// \brief An example showing a union node
/// \param exec_context The execution context to run the plan in
///
/// Source-Union-Sink
/// This example shows how a union operation can be applied on two
/// data sources. The output is obtained as a table via the sink
/// node.
arrow::Status SourceUnionSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  std::shared_ptr<cp::ExecPlan> plan = cp::ExecPlan::Make(&exec_context).ValueOrDie();
  arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>> sink_gen;

  cp::Declaration union_node{"union", cp::ExecNodeOptions{}};
  cp::Declaration lhs{"source",
                      cp::SourceNodeOptions{basic_data.schema, basic_data.gen()}};
  lhs.label = "lhs";
  cp::Declaration rhs{"source",
                      cp::SourceNodeOptions{basic_data.schema, basic_data.gen()}};
  rhs.label = "rhs";
  union_node.inputs.emplace_back(lhs);
  union_node.inputs.emplace_back(rhs);

  cp::CountOptions options(cp::CountOptions::ONLY_VALID);
  ARROW_ASSIGN_OR_RAISE(
      auto declr, cp::Declaration::Sequence({
                                                union_node,
                                                {"sink", cp::SinkNodeOptions{&sink_gen}},
                                            })
                      .AddToPlan(plan.get()));

  ARROW_RETURN_NOT_OK(declr->Validate());

  ARROW_RETURN_NOT_OK(plan->Validate());
  return ExecutePlanAndCollectAsTable(exec_context, plan, basic_data.schema, sink_gen);
}

// (Doc section: Union Example)

// (Doc section: Table Sink Example)

/// \brief An example showing a table sink node
/// \param exec_context The execution context to run the plan in
///
/// TableSink Example
/// This example shows how a table_sink can be used
/// in an execution plan. This includes a source node
/// receiving data as batches and the table sink node
/// which emits the output as a table.
arrow::Status TableSinkExample(cp::ExecContext& exec_context) {
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<cp::ExecPlan> plan,
                        cp::ExecPlan::Make(&exec_context));

  ARROW_ASSIGN_OR_RAISE(auto basic_data, MakeBasicBatches());

  auto source_node_options = cp::SourceNodeOptions{basic_data.schema, basic_data.gen()};

  ARROW_ASSIGN_OR_RAISE(cp::ExecNode * source,
                        cp::MakeExecNode("source", plan.get(), {}, source_node_options));

  std::shared_ptr<arrow::Table> output_table;
  auto table_sink_options = cp::TableSinkNodeOptions{&output_table};

  ARROW_RETURN_NOT_OK(
      cp::MakeExecNode("table_sink", plan.get(), {source}, table_sink_options));
  // validate the ExecPlan
  ARROW_RETURN_NOT_OK(plan->Validate());
  std::cout << "ExecPlan created : " << plan->ToString() << std::endl;
  // start the ExecPlan
  ARROW_RETURN_NOT_OK(plan->StartProducing());

  // Wait for the plan to finish
  auto finished = plan->finished();
  RETURN_NOT_OK(finished.status());
  std::cout << "Results : " << output_table->ToString() << std::endl;
  return arrow::Status::OK();
}
// (Doc section: Table Sink Example)

enum ExampleMode {
  SOURCE_SINK = 0,
  TABLE_SOURCE_SINK = 1,
  SCAN = 2,
  FILTER = 3,
  PROJECT = 4,
  SCALAR_AGGREGATION = 5,
  GROUP_AGGREGATION = 6,
  CONSUMING_SINK = 7,
  ORDER_BY_SINK = 8,
  HASHJOIN = 9,
  KSELECT = 10,
  WRITE = 11,
  UNION = 12,
  TABLE_SOURCE_TABLE_SINK = 13
};

int main(int argc, char** argv) {
  if (argc < 2) {
    // Fake success for CI purposes.
    return EXIT_SUCCESS;
  }

  std::string base_save_path = argv[1];
  int mode = std::atoi(argv[2]);
  arrow::Status status;
  // ensure arrow::dataset node factories are in the registry
  arrow::dataset::internal::Initialize();
  // execution context
  cp::ExecContext exec_context;
  switch (mode) {
    case SOURCE_SINK:
      PrintBlock("Source Sink Example");
      status = SourceSinkExample(exec_context);
      break;
    case TABLE_SOURCE_SINK:
      PrintBlock("Table Source Sink Example");
      status = TableSourceSinkExample(exec_context);
      break;
    case SCAN:
      PrintBlock("Scan Example");
      status = ScanSinkExample(exec_context);
      break;
    case FILTER:
      PrintBlock("Filter Example");
      status = ScanFilterSinkExample(exec_context);
      break;
    case PROJECT:
      PrintBlock("Project Example");
      status = ScanProjectSinkExample(exec_context);
      break;
    case GROUP_AGGREGATION:
      PrintBlock("Aggregate Example");
      status = SourceGroupAggregateSinkExample(exec_context);
      break;
    case SCALAR_AGGREGATION:
      PrintBlock("Aggregate Example");
      status = SourceScalarAggregateSinkExample(exec_context);
      break;
    case CONSUMING_SINK:
      PrintBlock("Consuming-Sink Example");
      status = SourceConsumingSinkExample(exec_context);
      break;
    case ORDER_BY_SINK:
      PrintBlock("OrderBy Example");
      status = SourceOrderBySinkExample(exec_context);
      break;
    case HASHJOIN:
      PrintBlock("HashJoin Example");
      status = SourceHashJoinSinkExample(exec_context);
      break;
    case KSELECT:
      PrintBlock("KSelect Example");
      status = SourceKSelectExample(exec_context);
      break;
    case WRITE:
      PrintBlock("Write Example");
      status = ScanFilterWriteExample(exec_context, base_save_path);
      break;
    case UNION:
      PrintBlock("Union Example");
      status = SourceUnionSinkExample(exec_context);
      break;
    case TABLE_SOURCE_TABLE_SINK:
      PrintBlock("TableSink Example");
      status = TableSinkExample(exec_context);
      break;
    default:
      break;
  }

  if (status.ok()) {
    return EXIT_SUCCESS;
  } else {
    std::cout << "Error occurred: " << status.message() << std::endl;
    return EXIT_FAILURE;
  }
}

Compute Functions

Input / output and filesystems

Streaming execution engine#

Motivation#

Overview#

Constructing ExecPlan objects#

Constructing ExecNode using Options#

source#

table_source#

filter#

project#

aggregate#

sink#

consuming_sink#

order_by_sink#

select_k_sink#

table_sink#

scan#

write#

union#

hash_join#

Summary#

Constructing `ExecPlan` objects#

Constructing `ExecNode` using Options#

`source`#

`table_source`#

`filter`#

`project`#

`aggregate`#

`sink`#

`consuming_sink`#

`order_by_sink`#

`select_k_sink`#

`table_sink`#

`scan`#

`write`#

`union`#

`hash_join`#