Working with the C++ Implementation

This section of the cookbook goes over basic concepts that will be needed regardless of how you intend to use the Arrow C++ implementation.

Working with Status and Result

C++ libraries often have to choose between throwing exceptions and returning error codes. Arrow chooses to return Status and Result objects as a middle ground. This makes it clear when a function can fail and is easier to use than integer arrow codes.

It is important to always check the value of a returned Status object to ensure that the operation succeeded. However, this can quickly become tedious:

Checking the status of every function manually
std::function<arrow::Status()> test_fn = [] {
  arrow::NullBuilder builder;
  arrow::Status st = builder.Reserve(2);
  // Tedious return value check
  if (!st.ok()) {
    return st;
  }
  st = builder.AppendNulls(-1);
  // Tedious return value check
  if (!st.ok()) {
    return st;
  }
  rout << "Appended -1 null values?" << std::endl;
  return arrow::Status::OK();
};
arrow::Status st = test_fn();
rout << st << std::endl;
Code Output
Invalid: length must be positive

The macro ARROW_RETURN_NOT_OK will take care of some of this boilerplate for you. It will run the contained expression and check the resulting Status or Result object. If it failed then it will return the failure.

Using ARROW_RETURN_NOT_OK to check the status
std::function<arrow::Status()> test_fn = [] {
  arrow::NullBuilder builder;
  ARROW_RETURN_NOT_OK(builder.Reserve(2));
  ARROW_RETURN_NOT_OK(builder.AppendNulls(-1));
  rout << "Appended -1 null values?" << std::endl;
  return arrow::Status::OK();
};
arrow::Status st = test_fn();
rout << st << std::endl;
Code Output
Invalid: length must be positive

Using the Visitor Pattern

Arrow classes arrow::DataType, arrow::Scalar, and arrow::Array have specialized subclasses for each Arrow type. In order to specialize logic for each subclass, you can use the visitor pattern. Arrow provides inline template functions that allow you to call visitors efficiently:

Generate Random Data

See example at Generate Random Data for a Given Schema.

Generalize Computations Across Arrow Types

Array visitors can be useful when writing functions that can handle multiple array types. However, implementing a visitor for each type individually can be excessively verbose. Fortunately, Arrow provides type traits that allow you to write templated functions to handle subsets of types. The example below demonstrates a table sum function that can handle any integer or floating point array with only a single visitor implementation by leveraging arrow::enable_if_number.

Using visitor pattern that can compute sum of table with any numeric type
 1class TableSummation {
 2  double partial = 0.0;
 3 public:
 4
 5  arrow::Result<double> Compute(std::shared_ptr<arrow::RecordBatch> batch) {
 6    for (std::shared_ptr<arrow::Array> array : batch->columns()) {
 7      ARROW_RETURN_NOT_OK(arrow::VisitArrayInline(*array, this));
 8    }
 9    return partial;
10  }
11
12  // Default implementation
13  arrow::Status Visit(const arrow::Array& array) {
14    return arrow::Status::NotImplemented("Can not compute sum for array of type ",
15                                         array.type()->ToString());
16  }
17
18  template <typename ArrayType, typename T = typename ArrayType::TypeClass>
19  arrow::enable_if_number<T, arrow::Status> Visit(const ArrayType& array) {
20    for (std::optional<typename T::c_type> value : array) {
21      if (value.has_value()) {
22        partial += static_cast<double>(value.value());
23      }
24    }
25    return arrow::Status::OK();
26  }
27};  // TableSummation
std::shared_ptr<arrow::Schema> schema = arrow::schema({
    arrow::field("a", arrow::int32()),
    arrow::field("b", arrow::float64()),
});
int32_t num_rows = 3;
std::vector<std::shared_ptr<arrow::Array>> columns;

arrow::Int32Builder a_builder = arrow::Int32Builder();
std::vector<int32_t> a_vals = {1, 2, 3};
ARROW_RETURN_NOT_OK(a_builder.AppendValues(a_vals));
ARROW_ASSIGN_OR_RAISE(auto a_arr, a_builder.Finish());
columns.push_back(a_arr);

arrow::DoubleBuilder b_builder = arrow::DoubleBuilder();
std::vector<double> b_vals = {4.0, 5.0, 6.0};
ARROW_RETURN_NOT_OK(b_builder.AppendValues(b_vals));
ARROW_ASSIGN_OR_RAISE(auto b_arr, b_builder.Finish());
columns.push_back(b_arr);

auto batch = arrow::RecordBatch::Make(schema, num_rows, columns);

// Call
TableSummation summation;
ARROW_ASSIGN_OR_RAISE(auto total, summation.Compute(batch));

rout << "Total is " << total;
Code Output
Total is 21