Hotspot: How const Can Improve Performance
Some time ago, I noticed that a unit test was quite slow, using 100% CPU for a number of seconds at one point in the test.
I used perf and KDAB’s Hotspot to record and examine where the CPU cycles were spent in that unit test, and I quickly noticed that a lot of time was spent in QFileSystemEntry::fileName(), an internal method in Qt that’s called when listing directories with QDir.
Here’s what zooming into that method with Hotspot showed:
We can see that QFileSystemEntry::fileName() calls QString::lastIndexOf(), which calls data() on the string, i.e. detach(), leading to an expensive deep copy! Surely that’s not supposed to happen from a const method. Looking at the Qt code revealed that lastIndexOf() calls a template method which calls .data() on the string-like template type (which is in this case a QString but can also be QStringView or QLatin1String, and those don’t have a constData() method). The author of that method in fact never intended for it to be instantiated with a QString type besides QStringView, but that’s exactly what happened, accidentally. The fix was, therefore, to wrap the QString in a QStringView, whose data() method does not detach.
After making the change, the expensive detach() disappeared:
Don’t be fooled by the fact that the rectangle for QFileSystemEntry::fileName is just as wide as before. I selected it in Hotspot, so it always takes 100% of the width. However it’s 100% of a much smaller amount now.
I submitted the fix for inclusion into Qt and it got merged into Qt 5.14.1 and following versions. You can see the patch in the Qt merge request.
By the way, when I give trainings about profiling of C++ applications, I always insist that profiling should be done on an optimized build, never on a non-optimized (“debug”) build. This is so we don’t do the job of the compiler optimizer ourselves. But this is one case where even profiling a debug build showed very interesting results, and the fix had an impact on release builds as well. That’s because, in this case, the compiler cannot optimize away the deep string copy triggered by the call to data(), so the bug did exist in optimized builds too.
In conclusion, if something is slow, fire up Hotspot, record the program (ideally a unit test, but the full program is also a possibility), switch to the flamegraph, and hopefully the reason for the slowness will be very visible.