Debugging code using Arrow

GDB extension for Arrow C++

By default, when asked to print the value of a C++ object, GDB displays the contents of its member variables. However, for C++ objects this does not often yield a very useful output, as C++ classes tend to hide their implementation details behind methods and accessors.

For example, here is how a arrow::Status instance may be displayed by GDB:

$3 = {
  <arrow::util::EqualityComparable<arrow::Status>> = {<No data fields>},
  <arrow::util::ToStringOstreamable<arrow::Status>> = {<No data fields>},
  members of arrow::Status:
  state_ = 0x0
}

and here is a arrow::Decimal128Scalar:

$4 = (arrow::Decimal128Scalar) {
  <arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>> = {
    <arrow::internal::PrimitiveScalarBase> = {
      <arrow::Scalar> = {
        <arrow::util::EqualityComparable<arrow::Scalar>> = {<No data fields>},
        members of arrow::Scalar:
        _vptr.Scalar = 0x7ffff6870e78 <vtable for arrow::Decimal128Scalar+16>,
        type = std::shared_ptr<arrow::DataType> (use count 1, weak count 0) = {
          get() = 0x555555ce58a0
        },
        is_valid = true
      }, <No data fields>},
    members of arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>:
    value = {
      <arrow::BasicDecimal128> = {
        <arrow::GenericBasicDecimal<arrow::BasicDecimal128, 128, 2>> = {
          static kHighWordIndex = <optimized out>,
          static kBitWidth = 128,
          static kByteWidth = 16,
          static LittleEndianArray = <optimized out>,
          array_ = {
            _M_elems = {[0] = 1234567, [1] = 0}
          }
        },
        members of arrow::BasicDecimal128:
        static kMaxPrecision = 38,
        static kMaxScale = 38
      }, <No data fields>}
  }, <No data fields>}

Fortunately, GDB also allows custom extensions to override the default printing for specific types. We provide a GDB extension written in Python that enables pretty-printing for common Arrow C++ classes, so as to enable a more productive debugging experience. For example, here is how the aforementioned arrow::Status instance will be displayed:

$5 = arrow::Status::OK()

and here is the same arrow::Decimal128Scalar instance as above:

$6 = arrow::Decimal128Scalar of value 123.4567 [precision=10, scale=4]

Manual loading

To enable the GDB extension for Arrow, you can simply download it somewhere on your computer and source it from the GDB prompt:

(gdb) source path/to/gdb_arrow.py

You will have to source it on each new GDB session. You might want to make this implicit by adding the source invocation in a gdbinit file.

Automatic loading

GDB provides a facility to automatically load scripts or extensions for each object file or library that is involved in a debugging session. You will need to:

  1. Find out what the auto-load locations are for your GDB install. This can be determined using show subcommands on the GDB prompt; the answer will depend on the operating system.

    Here is an example on Ubuntu:

    (gdb) show auto-load scripts-directory
    List of directories from which to load auto-loaded scripts is $debugdir:$datadir/auto-load.
    (gdb) show data-directory
    GDB's data directory is "/usr/share/gdb".
    (gdb) show debug-file-directory
    The directory where separate debug symbols are searched for is "/usr/lib/debug".
    

    This tells you that the directories used for auto-loading are $debugdir and $datadir/auto-load, which expand to /usr/lib/debug/ and /usr/share/gdb/auto-load respectively.

  2. Find out the full path to the Arrow C++ DLL, with all symlinks resolved. For example, you might have installed Arrow 7.0 in /usr/local and the path to the Arrow C++ DLL could then be /usr/local/lib/libarrow.so.700.0.0.

  3. Determine the actual auto-load script path. It is computed by a) taking the path of the auto-load directory of your choice, b) appending the full path to the Arrow C++ DLL, c) appending -gdb.py at the tail.

    In the example above, if we choose /usr/share/gdb/auto-load as auto-load directory, the full path to the auto-load script will have to be /usr/share/gdb/auto-load/usr/local/lib/libarrow.so.700.0.0-gdb.py.

  4. Either copy or symlink the GDB extension to the file path determined in step 3 above.

If everything went well, then as soon as GDB encounters the Arrow C++ DLL, it will automatically load the Arrow GDB extension so as to pretty-print Arrow C++ classes on the display prompt.

Supported classes

The Arrow GDB extension provides pretty-printing for the core Arrow C++ classes:

Important utility classes are also covered: