If you are a developer working with Arrow code, the package’s use of tidy eval and C++ necessitates a solid debugging strategy. In this article, we recommend a few approaches.
Debugging R code
In general, we have found that using interactive debugging
(e.g. calls to browser()
), where you can inspect objects in
a particular environment, is more efficient than simpler techniques such
as print()
statements.
Getting more descriptive C++ error messages after a segfault
If you are working in the RStudio IDE, your R session will be aborted if there is a segfault. If you re-run your code in a command-line R session, the session isn’t automatically aborted and so it will be possible to copy the error message accompanying the segfault. Here is an example from a bug which existed at time of writing.
> S3FileSystem$create()
*** caught segfault ***
address 0x1a0, cause 'memory not mapped'
Traceback:
1: (function (anonymous, access_key, secret_key, session_token, role_arn, session_name, external_id, load_frequency, region, endpoint_override, scheme, background_writes) { .Call(`_arrow_fs___S3FileSystem__create`, anonymous, access_key, secret_key, session_token, role_arn, session_name, external_id, load_frequency, region, endpoint_override, scheme, background_writes)})(access_key = "", secret_key = "", session_token = "", role_arn = "", session_name = "", external_id = "", load_frequency = 900L, region = "", endpoint_override = "", scheme = "", background_writes = TRUE, anonymous = FALSE)
2: exec(fs___S3FileSystem__create, !!!args)
3: S3FileSystem$create()
This output provides the R traceback; however, it doesn’t provide any information about the exact line of C++ code from which the segfault originated. For this, you will need to run R with the C++ debugger attached.
Running R code with the C++ debugger attached
As Arrow has C++ code at its core, debugging code can sometimes be tricky when errors originate in the C++ rather than the R layer. If you are adding new code which triggers a C++ bug (or find one in existing code), this can result in a segfault. If you are working in RStudio, the session is aborted, and you may not be able to retrieve the error messaging needed to diagnose and/or report the bug. One way around this is to find the code that causes the error, and run R with a C++ debugger.
If you are using macOS and have installed R using the Apple installer, you will not be able to run R with a debugger attached; please see the instructions here for details on causes of this and workarounds.
Firstly, load R with your debugger. The most common debuggers are
gdb
(typically found on Linux, sometimes on macOS, or
Windows via MinGW or Cygwin) and lldb
(the default macOS
debugger).
In my case it’s gdb
, but if you’re using the
lldb
debugger (for example, if you’re on a Mac), just swap
in that command here.
R -d gdb
Next, run R.
run
You should now be in an R session with the C++ debugger attached. This will look similar to a normal R session, but with extra output.
Now, run your code - either directly in the session or by sourcing it from a file. If the code results in a segfault, you will have extra output that you can use to diagnose the problem or attach to an issue as extra information.
Here is debugger output from the segfault shown in the previous example. You can see here that the exact line which triggers the segfault is included in the output.
> S3FileSystem$create()
Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x00007ffff0128369 in std::__atomic_base<long>::operator++ (this=0x178) at /usr/include/c++/9/bits/atomic_base.h:318
318 operator++() noexcept
Getting debugger output if your session hangs
The instructions above can provide valuable additional context when a segfault occurs. However, there are occasionally circumstances in which a bug could cause your session to hang indefinitely without segfaulting. In this case, it may be diagnostically useful to interrupt the debugger and generate backtraces from all running threads.
To do this, firstly, press Ctrl/Cmd and C to interrupt the debugger, and then run:
thread apply all bt
This will generate a large amount of output, but this information is useful when identifying the cause of the issue.
Further reading
The following resources provide detailed guides to debugging R code:
For an excellent in-depth guide to using the C++ debugger in R, see this blog post by David Vaughan.
You can find a list of equivalent gdb and lldb commands on the LLDB website.