Debugging code using Arrow#
GDB extension for Arrow C++#
By default, when asked to print the value of a C++ object, GDB displays the contents of its member variables. However, for C++ objects this does not often yield a very useful output, as C++ classes tend to hide their implementation details behind methods and accessors.
For example, here is how a arrow::Status
instance may be displayed
by GDB:
$3 = {
<arrow::util::EqualityComparable<arrow::Status>> = {<No data fields>},
<arrow::util::ToStringOstreamable<arrow::Status>> = {<No data fields>},
members of arrow::Status:
state_ = 0x0
}
and here is a arrow::Decimal128Scalar
:
$4 = (arrow::Decimal128Scalar) {
<arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>> = {
<arrow::internal::PrimitiveScalarBase> = {
<arrow::Scalar> = {
<arrow::util::EqualityComparable<arrow::Scalar>> = {<No data fields>},
members of arrow::Scalar:
_vptr.Scalar = 0x7ffff6870e78 <vtable for arrow::Decimal128Scalar+16>,
type = std::shared_ptr<arrow::DataType> (use count 1, weak count 0) = {
get() = 0x555555ce58a0
},
is_valid = true
}, <No data fields>},
members of arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>:
value = {
<arrow::BasicDecimal128> = {
<arrow::GenericBasicDecimal<arrow::BasicDecimal128, 128, 2>> = {
static kHighWordIndex = <optimized out>,
static kBitWidth = 128,
static kByteWidth = 16,
static LittleEndianArray = <optimized out>,
array_ = {
_M_elems = {[0] = 1234567, [1] = 0}
}
},
members of arrow::BasicDecimal128:
static kMaxPrecision = 38,
static kMaxScale = 38
}, <No data fields>}
}, <No data fields>}
Fortunately, GDB also allows custom extensions to override the default printing
for specific types. We provide a
GDB extension
written in Python that enables pretty-printing for common Arrow C++ classes,
so as to enable a more productive debugging experience. For example,
here is how the aforementioned arrow::Status
instance will be
displayed:
$5 = arrow::Status::OK()
and here is the same arrow::Decimal128Scalar
instance as above:
$6 = arrow::Decimal128Scalar of value 123.4567 [precision=10, scale=4]
Manual loading#
To enable the GDB extension for Arrow, you can simply
download it
somewhere on your computer and source
it from the GDB prompt:
(gdb) source path/to/gdb_arrow.py
You will have to source
it on each new GDB session. You might want to
make this implicit by adding the source
invocation in a
gdbinit file.
Automatic loading#
GDB provides a facility to automatically load scripts or extensions for each object file or library that is involved in a debugging session. You will need to:
Find out what the auto-load locations are for your GDB install. This can be determined using
show
subcommands on the GDB prompt; the answer will depend on the operating system.Here is an example on Ubuntu:
(gdb) show auto-load scripts-directory List of directories from which to load auto-loaded scripts is $debugdir:$datadir/auto-load. (gdb) show data-directory GDB's data directory is "/usr/share/gdb". (gdb) show debug-file-directory The directory where separate debug symbols are searched for is "/usr/lib/debug".
This tells you that the directories used for auto-loading are
$debugdir
and$datadir/auto-load
, which expand to/usr/lib/debug/
and/usr/share/gdb/auto-load
respectively.Find out the full path to the Arrow C++ DLL, with all symlinks resolved. For example, you might have installed Arrow 7.0 in
/usr/local
and the path to the Arrow C++ DLL could then be/usr/local/lib/libarrow.so.700.0.0
.Determine the actual auto-load script path. It is computed by a) taking the path of the auto-load directory of your choice, b) appending the full path to the Arrow C++ DLL, c) appending
-gdb.py
at the tail.In the example above, if we choose
/usr/share/gdb/auto-load
as auto-load directory, the full path to the auto-load script will have to be/usr/share/gdb/auto-load/usr/local/lib/libarrow.so.700.0.0-gdb.py
.Either copy or symlink the GDB extension to the file path determined in step 3 above.
If everything went well, then as soon as GDB encounters the Arrow C++ DLL, it will automatically load the Arrow GDB extension so as to pretty-print Arrow C++ classes on the display prompt.
Supported classes#
The Arrow GDB extension provides pretty-printing for the core Arrow C++ classes:
arrow::DataType
and subclassesarrow::ArrayData
,arrow::Array
and subclassesarrow::Scalar
and subclasses
Important utility classes are also covered:
arrow::Buffer
and subclassesarrow::util::string_view
,arrow::util::optional
,arrow::util::Variant