Working with the C++ Implementation¶
This section of the cookbook goes over basic concepts that will be needed regardless of how you intend to use the Arrow C++ implementation.
Working with Status and Result¶
C++ libraries often have to choose between throwing exceptions and returning error codes. Arrow chooses to return Status and Result objects as a middle ground. This makes it clear when a function can fail and is easier to use than integer arrow codes.
It is important to always check the value of a returned Status object to ensure that the operation succeeded. However, this can quickly become tedious:
std::function<arrow::Status()> test_fn = [] {
arrow::NullBuilder builder;
arrow::Status st = builder.Reserve(2);
// Tedious return value check
if (!st.ok()) {
return st;
}
st = builder.AppendNulls(-1);
// Tedious return value check
if (!st.ok()) {
return st;
}
rout << "Appended -1 null values?" << std::endl;
return arrow::Status::OK();
};
arrow::Status st = test_fn();
rout << st << std::endl;
Invalid: length must be positive
The macro ARROW_RETURN_NOT_OK
will take care of some of this
boilerplate for you. It will run the contained expression and check the resulting
Status
or Result
object. If it failed then it will return the failure.
std::function<arrow::Status()> test_fn = [] {
arrow::NullBuilder builder;
ARROW_RETURN_NOT_OK(builder.Reserve(2));
ARROW_RETURN_NOT_OK(builder.AppendNulls(-1));
rout << "Appended -1 null values?" << std::endl;
return arrow::Status::OK();
};
arrow::Status st = test_fn();
rout << st << std::endl;
Invalid: length must be positive
Using the Visitor Pattern¶
Arrow classes arrow::DataType
, arrow::Scalar
, and
arrow::Array
have specialized subclasses for each Arrow type. In
order to specialize logic for each subclass, you can use the visitor pattern.
Arrow provides inline template functions that allow you to call visitors
efficiently:
Generate Random Data¶
See example at Generate Random Data for a Given Schema.
Generalize Computations Across Arrow Types¶
Array visitors can be useful when writing functions that can handle multiple
array types. However, implementing a visitor for each type individually can be
excessively verbose. Fortunately, Arrow provides type traits that allow you to
write templated functions to handle subsets of types. The example below
demonstrates a table sum function that can handle any integer or floating point
array with only a single visitor implementation by leveraging
arrow::enable_if_number
.
1class TableSummation {
2 double partial = 0.0;
3 public:
4
5 arrow::Result<double> Compute(std::shared_ptr<arrow::RecordBatch> batch) {
6 for (std::shared_ptr<arrow::Array> array : batch->columns()) {
7 ARROW_RETURN_NOT_OK(arrow::VisitArrayInline(*array, this));
8 }
9 return partial;
10 }
11
12 // Default implementation
13 arrow::Status Visit(const arrow::Array& array) {
14 return arrow::Status::NotImplemented("Can not compute sum for array of type ",
15 array.type()->ToString());
16 }
17
18 template <typename ArrayType, typename T = typename ArrayType::TypeClass>
19 arrow::enable_if_number<T, arrow::Status> Visit(const ArrayType& array) {
20 for (std::optional<typename T::c_type> value : array) {
21 if (value.has_value()) {
22 partial += static_cast<double>(value.value());
23 }
24 }
25 return arrow::Status::OK();
26 }
27}; // TableSummation
std::shared_ptr<arrow::Schema> schema = arrow::schema({
arrow::field("a", arrow::int32()),
arrow::field("b", arrow::float64()),
});
int32_t num_rows = 3;
std::vector<std::shared_ptr<arrow::Array>> columns;
arrow::Int32Builder a_builder = arrow::Int32Builder();
std::vector<int32_t> a_vals = {1, 2, 3};
ARROW_RETURN_NOT_OK(a_builder.AppendValues(a_vals));
ARROW_ASSIGN_OR_RAISE(auto a_arr, a_builder.Finish());
columns.push_back(a_arr);
arrow::DoubleBuilder b_builder = arrow::DoubleBuilder();
std::vector<double> b_vals = {4.0, 5.0, 6.0};
ARROW_RETURN_NOT_OK(b_builder.AppendValues(b_vals));
ARROW_ASSIGN_OR_RAISE(auto b_arr, b_builder.Finish());
columns.push_back(b_arr);
auto batch = arrow::RecordBatch::Make(schema, num_rows, columns);
// Call
TableSummation summation;
ARROW_ASSIGN_OR_RAISE(auto total, summation.Compute(batch));
rout << "Total is " << total;
Total is 21