Debugging code using Arrow#
GDB extension for Arrow C++#
By default, when asked to print the value of a C++ object, GDB displays the contents of its member variables. However, for C++ objects this does not often yield a very useful output, as C++ classes tend to hide their implementation details behind methods and accessors.
For example, here is how a arrow::Status instance may be displayed
by GDB:
$3 = {
  <arrow::util::EqualityComparable<arrow::Status>> = {<No data fields>},
  <arrow::util::ToStringOstreamable<arrow::Status>> = {<No data fields>},
  members of arrow::Status:
  state_ = 0x0
}
and here is a arrow::Decimal128Scalar:
$4 = (arrow::Decimal128Scalar) {
  <arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>> = {
    <arrow::internal::PrimitiveScalarBase> = {
      <arrow::Scalar> = {
        <arrow::util::EqualityComparable<arrow::Scalar>> = {<No data fields>},
        members of arrow::Scalar:
        _vptr.Scalar = 0x7ffff6870e78 <vtable for arrow::Decimal128Scalar+16>,
        type = std::shared_ptr<arrow::DataType> (use count 1, weak count 0) = {
          get() = 0x555555ce58a0
        },
        is_valid = true
      }, <No data fields>},
    members of arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>:
    value = {
      <arrow::BasicDecimal128> = {
        <arrow::GenericBasicDecimal<arrow::BasicDecimal128, 128, 2>> = {
          static kHighWordIndex = <optimized out>,
          static kBitWidth = 128,
          static kByteWidth = 16,
          static LittleEndianArray = <optimized out>,
          array_ = {
            _M_elems = {[0] = 1234567, [1] = 0}
          }
        },
        members of arrow::BasicDecimal128:
        static kMaxPrecision = 38,
        static kMaxScale = 38
      }, <No data fields>}
  }, <No data fields>}
Fortunately, GDB also allows custom extensions to override the default printing
for specific types.  We provide a
GDB extension
written in Python that enables pretty-printing for common Arrow C++ classes,
so as to enable a more productive debugging experience.  For example,
here is how the aforementioned arrow::Status instance will be
displayed:
$5 = arrow::Status::OK()
and here is the same arrow::Decimal128Scalar instance as above:
$6 = arrow::Decimal128Scalar of value 123.4567 [precision=10, scale=4]
Manual loading#
To enable the GDB extension for Arrow, you can simply
download it
somewhere on your computer and source it from the GDB prompt:
(gdb) source path/to/gdb_arrow.py
You will have to source it on each new GDB session.  You might want to
make this implicit by adding the source invocation in a
gdbinit file.
Automatic loading#
GDB provides a facility to automatically load scripts or extensions for each object file or library that is involved in a debugging session. You will need to:
- Find out what the auto-load locations are for your GDB install. This can be determined using - showsubcommands on the GDB prompt; the answer will depend on the operating system.- Here is an example on Ubuntu: - (gdb) show auto-load scripts-directory List of directories from which to load auto-loaded scripts is $debugdir:$datadir/auto-load. (gdb) show data-directory GDB's data directory is "/usr/share/gdb". (gdb) show debug-file-directory The directory where separate debug symbols are searched for is "/usr/lib/debug". - This tells you that the directories used for auto-loading are - $debugdirand- $datadir/auto-load, which expand to- /usr/lib/debug/and- /usr/share/gdb/auto-loadrespectively.
- Find out the full path to the Arrow C++ DLL, with all symlinks resolved. For example, you might have installed Arrow 7.0 in - /usr/localand the path to the Arrow C++ DLL could then be- /usr/local/lib/libarrow.so.700.0.0.
- Determine the actual auto-load script path. It is computed by a) taking the path of the auto-load directory of your choice, b) appending the full path to the Arrow C++ DLL, c) appending - -gdb.pyat the tail.- In the example above, if we choose - /usr/share/gdb/auto-loadas auto-load directory, the full path to the auto-load script will have to be- /usr/share/gdb/auto-load/usr/local/lib/libarrow.so.700.0.0-gdb.py.
- Either copy or symlink the GDB extension to the file path determined in step 3 above. 
If everything went well, then as soon as GDB encounters the Arrow C++ DLL, it will automatically load the Arrow GDB extension so as to pretty-print Arrow C++ classes on the display prompt.
Supported classes#
The Arrow GDB extension provides pretty-printing for the core Arrow C++ classes:
- arrow::DataTypeand subclasses
- arrow::ArrayData,- arrow::Arrayand subclasses
- arrow::Scalarand subclasses
Important utility classes are also covered:
- arrow::Bufferand subclasses
 
    