How to use helgrind to debug multithreaded Qt applications

Author: David Faure, KDAB.

Helgrind introduction

You've heard of valgrind before, its default tool (memcheck) is such a life saver, being able to detect memory-related bugs in your code (leaks, double deletions, use of deleted memory, use of uninitialized memory, etc.).

Well, it turns out that valgrind also comes with a tool to detect race conditions between threads, in multithreaded applications. That tool is called helgrind. (There is also another tool called "drd", but I don't know the differences, and I have no experience with drd.)

In theory, provided that you're on a Unix platform, using helgrind is as simple as

valgrind --tool=helgrind myapplication

However, if you do just that on a Qt application, you'll end up digging through lots of false positives, making this a rather painful experience. So let's have a look at what is needed exactly to debug Qt4 and Qt5 applications with helgrind.

Valgrind

In order to benefit from a large number of fixes in helgrind itself (support for Qt5 QMutex, fix for the "destruction of unknown cond var" bug), you need valgrind >= 3.9.0. Either use the 3.9.0 release (or later), or checkout valgrind from svn and compile it yourself.

svn checkout svn://svn.valgrind.org/valgrind/trunk
./autogen.sh
./configure --prefix=/usr/local

Qt 5

  • Qt 5.0 is not good enough. Use Qt >= 5.1.
  • Make sure to configure Qt in debug mode, for the atomic suppressions (defined further down) to work correctly, and to be able to patch it.

In case you're curious, the reason why Qt 5.0 is not good enough is that I fixed the following issues in Qt:

  • QThreadDataPrivate: canWait race, fixed in Qt 5.1 (commit bf3a5cc) (backported to Qt 4.8.5 in commit 815d7f0)
  • QThread: race when setting the eventDispatcher, fixed in Qt 5.1 (commits f4609b2 and 85b25fc)
  • QEventDispatcherUNIX: race on the interrupt bool, fixed in Qt 5.1 (commit 49d7e71)
  • QEventLoop::exec()/exit() race, fixed in Qt 5.1 (commit 5a5a092, in stable only for now)

Still TODO: support for recursive mutexes (making helgrind understand that QRecursiveMutexPrivate::lock() calls QBasicMutex::lock(), not QMutex::lock(), so the QMutex intercept doesn't trigger, and we get false positives.

Qt 4

If you have to use Qt4 instead, you'll have some more false positives (well, real, but from Qt itself). Not only the ones fixed above, but also some races that got fixed during rewrites for Qt5.

For instance, a race that I fixed in Qt 5 is the race in QFuture::waitForResult, fixed in commit 7120cf16d.

Qt4 should be configured and compiled by yourself in debug mode. We'll need to patch it, below.

Patching Qt (applies to Qt 4 (< 4.8.6), Qt 5.0 and Qt 5.1)

The method qFlagLocation, called at every QObject::connect() statement, wasn't thread-safe until Qt 5.2 (QTBUG-3680). If you're using Qt 5.2 or Qt 4.8.6 you can skip this section.

In practice the qFlagLocation logic is only useful when there's a runtime error in the connect, which doesn't happen in well-written programs, but helgrind will of course flag every use of qFlagLocation. Fortunately we can easily fix that, using a gcc-specific patch (which is why it's not in Qt upstream).

cd $QTDIR
wget http://www.davidfaure.fr/kde/qflaglocation-fix.diff
git apply qflaglocation-fix.diff
cd src/corelib ; make

Suppressions

It wasn't possible to make helgrind perfect for Qt. In particular, helgrind has no way to distinguish a raw store to an int, from the use of an atomic store on the int, because on x86 there is no difference. For this reason, I used the poor man's solution: defining suppressions for all uses of the Qt Atomic classes. If we can already use helgrind to fix all the abuse of normal (non-atomic) variables in multithread apps, it's already a huge step forward, even if we (wrongly) tell it that "any use of the atomic api is fine".

cd ~
git clone git://anongit.kde.org/kde-dev-scripts
export VALGRIND_OPTS="--num-callers=50 --suppressions=$HOME/kde-dev-scripts/kde.supp"

The export should probably go into your ~/.zshrc or ~/.bashrc, so you have it set up once and for all.

Helgrind alias

In addition to detecting race conditions, helgrind also tried to detect potential deadlocks due to wrong locking order (A+B vs B+A). However the QOrderedMutexLocker trick in Qt confuses helgrind because of its interesting use of tryLock(), so the lock order feature of helgrind has to be disabled for now, using --track-lockorders=no. See bug 243232. I discussed a patch with the helgrind developers, but I still need to finish it.

The default event dispatcher in Qt uses the glib event loop, which has its own races, which we're not really interested in. Easy solution: export QT_NO_GLIB=1

For these two reasons, I recommend to add this line in your ~/.zshrc (or ~/.bashrc for people who haven't tried zsh yet)

alias helgrind="QT_NO_GLIB=1 valgrind --tool=helgrind --track-lockorders=no"

Happy debugging!

David Faure, march 2013 (updated january 2014).