How to use static analysis to improve performance
It’s usually said “only improve performance where a profiler tells you to“. I don’t completely agree.
- Take a big C++ library like Qt: can you profile all classes and all code paths ? It would take a couple of years to accomplish and analyse the results.
- It’s expensive: It usually only happens if the speed-up is big enough to justify the needed skilled developer time. Customers don’t pay for a 10% speed gain for a desktop application. Even open source hackers and enthusiasts will only optimize until it’s good enough.
- Last but not least, what if the code is evenly slow ? Profilers point you to bottlenecks, but if everything is as slow or as fast, nothing stands out. The results will be meaningless. Bad performance is usually a matter of a “death by a thousand cuts“, little inefficiencies so uniformly spread through the codebase that they go unnoticed by most tooling.Consequently, while profilers are great and have their place, I rather talk about complementary techniques to tackle the problem of the 5%-10% layer of cruft that goes undetected.
C++ compilers are mostly simple-minded, while they understand the syntax of the language, they won’t complain if you don’t follow, for example, Boost, STL or Qt best practices.
Wouldn’t it be great if compilers operated at an higher semantic level and understood more than C++ ? They could hint your STL algorithm would benefit from std::vector::reserve() calls, warn about Qt container detachments or even automatically rewrite your code to follow best practices regarding QStringLiteral and QLatin1String.
Fortunately the Clang project lets you do just that. Clang is a C/C++ frontend for the LLVM compiler infrastructure. It exposes a nice and modular API that allows you to tap into the build and hook in custom AST visitors which emit your own warnings and errors.
So, motivated by the fact that using grep and regular expressions wasn’t cutting it for me any more I decided to see how easy it was to write a clang plug-in and make the compiler work for me.
After a couple days of hacking and very few lines of code the clazy static checker is born:
$ clang++ -Xclang -load -Xclang ClangClazy.so -Xclang -add-plugin -Xclang clang-lazy -I /usr/include/qt/ -fPIC -std=c++11 -c test.cpp a.cpp:8:1: warning: non-POD static [-Wclazy-non-pod-global-static] static QString s_string; ^ a.cpp:24:13: warning: Reserve candidate [-Wclazy-reserve-candidates] structs.push_back(TestStruct()); ^ a.cpp:37:21: warning: Missing reference on large type sizeof std::vector<TestStruct> is 24 bytes) [-Wclazy-function-args-by-ref] void initialize(std::vector<TestStruct> structs) ^ a.cpp:39:9: warning: Use QHash<K,T> instead of QMap<K,T> when K is a pointer [-Wclazy-qmap-with-key-pointer] QMap<TestStruct*, int> shouldBeQHash; ^ a.cpp:40:18: warning: Missing reference in foreach with sizeof(T) = 4000 bytes [-Wclazy-foreacher] foreach (auto s, structs) ^ a.cpp:63:21: warning: Don't call QVector::first() on temporary [-Wclazy-detaching-temporary] TestStruct ts = tc.getVector().first(); ^ a.cpp:70:17: warning: QString(QLatin1String) being called [-Wclazy-qstring-uneeded-heap-allocations] QString s = QLatin1String("literal"); 6 warnings generated.
The checks related to memory allocations actually make a big difference and frequently appear under profilers. Missing reserve() calls and temporary QStrings due to QLatin1String/char* misuse create many small heap allocations resulting in internal fragmentation and we know most allocators aren’t very keen about returning resources back to kernel.
Can you think of any interesting check ? Please leave a comment.
I hope to have opened your appetite and curiosity, it wasn’t so long ago when most static analysers were primitive and based on regular expressions. C++ has since made great strides in improving the language, but the tooling is advancing just as fast.