How to use static analysis to improve performance

7 September 2015

Tags

It's usually said "only improve performance where a profiler tells you to".
I don't completely agree.

Take a big C++ library like Qt: can you profile all classes and all code paths ? It would take a couple of years to accomplish and analyse the results.
It's expensive: It usually only happens if the speed-up is big enough to justify the needed skilled developer time. Customers don't pay for a 10% speed gain for a desktop application. Even open source hackers and enthusiasts will only optimize until it's good enough.
Last but not least, what if the code is evenly slow ? Profilers point you to bottlenecks, but if everything is as slow or as fast, nothing stands out. The results will be meaningless. Bad performance is usually a matter of a "death by a thousand cuts", little inefficiencies so uniformly spread through the codebase that they go unnoticed by most tooling.Consequently, while profilers are great and have their place, I rather talk about complementary techniques to tackle the problem of the 5%-10% layer of cruft that goes undetected.Enter Clang:

C++ compilers are mostly simple-minded, while they understand the syntax of the language, they won't complain if you don't follow, for example, Boost, STL or Qt best practices.

Wouldn't it be great if compilers operated at an higher semantic level and understood more than C++ ? They could hint your STL algorithm would benefit from std::vector::reserve() calls, warn about Qt container detachments or even automatically rewrite your code to follow best practices regarding QStringLiteral and QLatin1String.

Fortunately the Clang project lets you do just that. Clang is a C/C++ frontend for the LLVM compiler infrastructure. It exposes a nice and modular API that allows you to tap into the build and hook in custom AST visitors which emit your own warnings and errors.

So, motivated by the fact that using grep and regular expressions wasn't cutting it for me any more I decided to see how easy it was to write a clang plug-in and make the compiler work for me.

After a couple days of hacking and very few lines of code the clazy static checker is born:

$ clang++ -Xclang -load -Xclang ClangClazy.so -Xclang -add-plugin -Xclang clang-lazy -I /usr/include/qt/ -fPIC -std=c++11 -c test.cpp a.cpp:8:1: warning: non-POD static [-Wclazy-non-pod-global-static] static QString s_string; ^ a.cpp:24:13: warning: Reserve candidate [-Wclazy-reserve-candidates] structs.push_back(TestStruct()); ^ a.cpp:37:21: warning: Missing reference on large type sizeof std::vector<TestStruct> is 24 bytes) [-Wclazy-function-args-by-ref] void initialize(std::vector<TestStruct> structs) ^ a.cpp:39:9: warning: Use QHash<K,T> instead of QMap<K,T> when K is a pointer [-Wclazy-qmap-with-key-pointer] QMap<TestStruct*, int> shouldBeQHash; ^ a.cpp:40:18: warning: Missing reference in foreach with sizeof(T) = 4000 bytes [-Wclazy-foreacher] foreach (auto s, structs) ^ a.cpp:63:21: warning: Don't call QVector::first() on temporary [-Wclazy-detaching-temporary] TestStruct ts = tc.getVector().first(); ^ a.cpp:70:17: warning: QString(QLatin1String) being called [-Wclazy-qstring-uneeded-heap-allocations] QString s = QLatin1String("literal"); 6 warnings generated.

The checks related to memory allocations actually make a big difference and frequently appear under profilers. Missing reserve() calls and temporary QStrings due to QLatin1String/char* misuse create many small heap allocations resulting in internal fragmentation and we know most allocators aren't very keen about returning resources back to kernel.

Can you think of any interesting check ? Please leave a comment.

I hope to have opened your appetite and curiosity, it wasn't so long ago when most static analysers were primitive and based on regular expressions.
C++ has since made great strides in improving the language, but the tooling is advancing just as fast.

Tags:

c++performance qml tools

31 Comments

Since you are asking for ideas for performance related checkers I thought I'd post a few small ideas.

Check for comparisons of .size() against 0 instead of calling .empty()/.isEmpty()
Check for use of post-increment where pre-increment could have been used just as well.
Check for loops that open-code std::fill() rather than calling the algorithm.
Check for loops that iterate column-major rather than row-major. Since the latter usually has better cache performance characteristics.

It would be great to have a check for passing shared_ptr parameters by value - and better still still to have a tool that converted such calls to passing by const reference!

@Clare,

You might be interested in this CppCon talk on automated refactoring for C++, which covers ideas like automatically fixing common anti-patterns:

https://www.youtube.com/watch?v=ZpvvmvITOrk

Heap variables that could be put on the stack (like a new at the beginning of a block and a delete at the end).
Passing types by const& when it would be faster to pass them by value.
Use of Qt containers in C++11 range-based for loops which implies a deep copy.
Downcasts with static_cast instead of dynamic_ / qobject_
Functions taking pointers in argument if they don't take ownership somewhere (they could take references instead).
Functions taking non-const elements and which does not modify them (a bit tricky because for instance the function could be virtual and the variable could be modified in the same function in another child class)
Elements initialized in the ctor instead of the member init-list

And more broadly-scoped ones : - Long types used in more than one place (e.g. QMap) which should be given a name instead - Code that would benefit from inlining (dunno if it's doable / computable ? maybe if it's very few LLVM instructions ?)

Check for use of 'new' to initialize unique_ptr/shared_ptr rather than using make_unique/make_shared.
Suggest using 'extern template' if the same template is instantiated in multiple compilation units (to improve build performance, not runtime performance).
Suggest using emplace()/emplace_back() rather than insert()/push_back() if the object in question is an rvalue.
Suggest removing virtual if a class has virtual functions but no other classes inherit from it and the virtual functions also do not override any from base classes.
If a class has a move constructor that is not 'noexcept', suggest to make it so. So that it can be used with functions that use std::move_if_noexcept to determine whether to move or copy objects (for example std::vector::resize()).

It seems that your plug-in is made for checking the use of Qt.

It will be great to separate the different kinds of checkings you do.

You can make a plug-in with several kind of checkings

1) Check code before c++11 2) Check c++11 code 3) Check code for specific libraries (like Qt ...)

It will be great to separate the different kinds of checking into several plug-in.

Check if a class is following the "rule of zero" -- if it defines any of the 5 auto-generated functions (dtor, copy ctor and assignment, move ctor and assignment), it should declare all of them.

Even continuing to follow the "rule of three" under C++11 can leave some performance on the table because if one of the auto-generated functions appears, the compiler won't generate move operations implicitly.

Very Nice, I was considering using libclang to compute the "sizeof" of all the class/struct defined in a project. Your checks takes in account object usage which is even better.

Maybe integrate your checks with clang-tidy(http://clang.llvm.org/extra/clang-tidy/index.html). This tool is handy if you just want the analysis without the compilation/code-emission.

A few more ideas:

check for uses of static's in header files. They will get a copy in multiple compilation units, and if the linker is not smart (enough) about removing them they can bloat your libraries and executables (which can have a performance impact).
Check for classes/structs where shuffling the order of member variables can significantly reduce the amount of padding the compiler has to add. Smaller objects are always nice, but when reordering members can bring a class size down from occupying more than 1 cache line to less than one then there can really be significant performance gains.
Look for code that uses "out parameters" that should just use a normal return value. The RVO and NRVO optimizations can kick in for normal returns - not so much for out parameters.
Check for empty user-defined implementations dof destructors and default constructors and suggest that they either be deleted or defaulted (=default) instead. When user-defined they are by dwfinition never trivial, but when they are trivial they can often enable compiler optimizations that are otherwise not available (especially if making those functions trivial makes the entire type POD).

By the way. Would you appreciate outside contributions on the code or would you rather just get ideas and input and implement them yourself? I think a tool like this is a great idea and I'd be happy to contribute (with what little time I have (which may be zero)). But I won't bother if you'd rather just have suggestions and evolve it on your own...?

I would absolutely accept contributions, I'm even moving the repo to KDE playground where everyone can participate.

I wonder however which types of new checks we want. For pure C++ checks we already have clang static analyser and clang-tidy, so maybe we should be collaborating with them.

For Qt specific checks I think clazy is the right place for it.

Clang tidy and analyzer are focused on finding bugs. As I see it, clazy fills a different niche; finding performance issues. So I'd say both Qt and plain C++ checks could go in, as long as they are performance related. For bug checks, submitting those to clang analyzer or tidy would make sense. Just my opinion :)

Hi,

need a bit of help please. I've downloaded clazy from https://quickgit.kde.org/?p=scratch%2Fsmartins%2Fclazy.git and clang from http://llvm.org/releases/download.html. Compilation failing here.

Downloads/scratch-smartins-clazy/Utils.cpp:859:48: error: no matching function for call to ‘std::vector::erase(__gnu_cxx::__normal_iterator<clang::DeclContext* const*, std::vector >&, __gnu_cxx::__normal_iterator<clang::DeclContext* const*, std::vector >)’
                 methodContexts.erase(it, it + 1);
                                                ^
Downloads/scratch-smartins-clazy/Utils.cpp:859:48: note: candidates are:
In file included from /usr/include/c++/4.8/vector:69:0,
                 from /usr/include/c++/4.8/bits/random.h:34,
                 from /usr/include/c++/4.8/random:50,
                 from /usr/include/c++/4.8/bits/stl_algo.h:65,
                 from /usr/include/c++/4.8/algorithm:62,
                 from /usr/include/llvm/ADT/SmallVector.h:22,
                 from /usr/include/llvm/Support/Allocator.h:24,
                 from /usr/include/clang/AST/ASTVector.h:23,
                 from /usr/include/clang/AST/ASTUnresolvedSet.h:18,
                 from /usr/include/clang/AST/DeclCXX.h:19,
                 from Downloads/scratch-smartins-clazy/Utils.h:31,
                 from Downloads/scratch-smartins-clazy/Utils.cpp:28:
/usr/include/c++/4.8/bits/vector.tcc:134:5: note: std::vector::iterator std::vector::erase(std::vector::iterator) [with _Tp = clang::DeclContext*; _Alloc = std::allocator; std::vector::iterator = __gnu_cxx::__normal_iterator<clang::DeclContext**, std::vector >; typename std::_Vector_base::pointer = clang::DeclContext**]
     vector::
     ^
/usr/include/c++/4.8/bits/vector.tcc:134:5: note:   candidate expects 1 argument, 2 provided
/usr/include/c++/4.8/bits/vector.tcc:146:5: note: std::vector::iterator std::vector::erase(std::vector::iterator, std::vector::iterator) [with _Tp = clang::DeclContext*; _Alloc = std::allocator; std::vector::iterator = __gnu_cxx::__normal_iterator<clang::DeclContext**, std::vector >; typename std::_Vector_base::pointer = clang::DeclContext**]
     vector::
     ^
/usr/include/c++/4.8/bits/vector.tcc:146:5: note:   no known conversion for argument 1 from ‘__gnu_cxx::__normal_iterator<clang::DeclContext* const*, std::vector >’ to ‘std::vector::iterator {aka __gnu_cxx::__normal_iterator<clang::DeclContext**, std::vector >}’
make[2]: *** [CMakeFiles/ClangLazy.dir/Utils.cpp.o] Error 1
make[1]: *** [CMakeFiles/ClangLazy.dir/all] Error 2
make: *** [all] Error 2

Thanks in advance.

Which compiler are you using to compile clazy itself ? While it should also compile fine with gcc, please try clang first (export CXX=clang++)

Fantastic stuff - I'm loving it!

One small issue: with the function-args-by-ref check, I see a warnings generated whenever the Q_DECLARE_OPERATORS_FOR_FLAGS macro is used. Could this macro be ignored for this test?

will have a look, thanks

I can't reproduce this problem. Please send me a minimal testcase (a compilable test.cpp).

I suspect it may be because I'm building my project using Qt 4.8. I'll investigate further.

In the meantime, would you consider adding this check? https://github.com/nyalldawson/clazy/commit/e1d08e988ee66bc16abed6655fb86024dc7fd75f

It's based heavily off the existing detach temporary check, but is designed to warn when iterator functions like begin() and end() are called on temporary containers. I got hit by this bug yesterday and thought it would make a good clazy check! The code may need some cleaning, but it works well for me.

Thanks, seems very useful!

Filed as https://bugs.kde.org/show_bug.cgi?id=353732

[  1%] Building CXX object src/thirdparty/qtlockedfile/CMakeFiles/qtlockedfile.dir/qtlockedfile.cpp.o
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
0  clang-3.7       0x00000000013e06b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  clang-3.7       0x00000000013e1a1b
2  libpthread.so.0 0x00007ff6acdf1d10
3  libc.so.6       0x00007ff6ac1ab267 gsignal + 55
4  libc.so.6       0x00007ff6ac1aceca abort + 362
5  libstdc++.so.6  0x00007ff6ac7e6b7d __gnu_cxx::__verbose_terminate_handler() + 365
6  libstdc++.so.6  0x00007ff6ac7e49c6
7  libstdc++.so.6  0x00007ff6ac7e4a11
8  libstdc++.so.6  0x00007ff6ac7e4c29
9  libstdc++.so.6  0x00007ff6ac7e51cc
10 ClangLazy.so    0x00007ff6ab3a3553
11 ClangLazy.so    0x00007ff6ab39fc4b OldStyleConnect::OldStyleConnect(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 107
12 ClangLazy.so    0x00007ff6ab3a3233
13 ClangLazy.so    0x00007ff6ab39154f CheckManager::createCheck(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 319
14 ClangLazy.so    0x00007ff6ab3924d4 CheckManager::createChecks(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >) + 516
15 ClangLazy.so    0x00007ff6ab3c9ce1
16 clang-3.7       0x000000000173b4ea clang::FrontendAction::CreateWrappedASTConsumer(clang::CompilerInstance&, llvm::StringRef) + 330
17 clang-3.7       0x000000000173c092 clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) + 2578
18 clang-3.7       0x000000000170d4c7 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 695
19 clang-3.7       0x00000000017a9de3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 3123
20 clang-3.7       0x00000000006f6634 cc1_main(llvm::ArrayRef, char const*, void*) + 1156
21 clang-3.7       0x00000000006f586f main + 11983
22 libc.so.6       0x00007ff6ac196a40 __libc_start_main + 240
23 clang-3.7       0x00000000006f2893

using prebuild clang from llvm website, ubuntu 15.10. with cmake command:

cmake .. -DCMAKE_INSTALL_PREFIX=/opt/dev_qt5 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS_DEBUG="-Xclang -load -Xclang /opt/clang37/lib/ClangLazy.so -Xclang -add-plugin -Xclang clang-lazy" -DCMAKE_PREFIX_PATH=/opt/clang37

Be sure do not mix different ABIs. gcc 5 is now using the new ABI (where std::string isn't COW), and the pre-built binaries of clang might still be using old ABI. So try clang from the ubuntu repos: apt-get install g++ cmake clang llvm git-core libclang-3.6-dev

problem with clang from official repos:

/usr/lib/llvm-3.6/bin/clang: symbol lookup error: /usr/lib/ClangLazy.so: undefined symbol: _ZNK5clang15DeclarationName11getAsStringEv

Ok. it works now. Requires libc++-dev libc++abi-dev packages. And -stdlib=libc++ flag.

Clazy builded with command: cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS_RELEASE="-std=c++11 -stdlib=libc++"

And program with command: qmake ../vacuum.pro -r INSTALL_PREFIX=/dev/qt5 CONFIG-=release CONFIG+=debug -spec linux-clang-libc++ QMAKE_CXXFLAGS+="-std=c++11 -stdlib=libc++ -Xclang -load -Xclang ClangLazy.so -Xclang -add-plugin -Xclang clang-lazy"

Oops, nvm, works without CLAZY_CHECKS env.

And "symbol lookup error" with export CLAZY_CHECKS=all_checks

Looks like problem with inefficient-qlist check.

Thanks for this great tool ! I am getting a lot of "c++11 range-loop might detach Qt container", especially when iterating on member variables in non const member function. What would be the best way to make sure it does not detach ? The only I can come up is "for(const auto &e: static_cast(this)->member)", but it feels ugly

Use foreach instead: foreach (const auto &e, member) {}

Really ? I thought foreach was slower than range-based for loop (if there is not detach) : https://bugreports.qt.io/browse/QTBUG-41636

Note the difference in orders of magnitude:

range-for with detachment: RESULT : Rangefor_Test::rangeFor_Vector(): 7.1 msecs per iteration (total: 57, iterations: 8)

range-for without detachment: RESULT : Rangefor_Test::rangeFor_ConstVector(): 0.000023 msecs per iteration (total: 99, iterations: 4194304)

foreach: RESULT : Rangefor_Test::forEach_Vector(): 0.000046 msecs per iteration (total: 97, iterations: 2097152)

Yes the foreach takes 2x as long as a range-loop that carefully doesn't detach. But a range-loop that detaches takes 15000x times more.

the clazy foreach/range-loop check was made to tackle the former.

These numbers actually do not have any meaning, because the loops were totally optimized away. Please refer to Torgeir Lilleskog's comment : https://bugreports.qt.io/browse/QTBUG-41636?focusedCommentId=258507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-258507

According to him switching from detaching rangeFor to foreach is a 4x speedup, but switching from foreach to non detaching rangefor if a 2x speedup, which is quite important

A lot of suggested checks are already implemented in Clang-tidy(http://clang.llvm.org/extra/clang-tidy/). C++11 migration checks were moved in there recently.

By the word, it may make sense to port Clazy checks to Clang-tidy.

Sérgio Martins

Senior Software Engineer

Sérgio has been with KDAB for 13 years. He loves performance profiling, debugging, static-analysis, and tooling. You won’t see him writing nice-looking GUIs or 3D. He’d rather enjoy tackling a hard problem, frequently embracing the pain of bugs nobody wants, such as HDPI, fonts, QDockWidget, and even working on Windows when needed.

How to use static analysis to improve performance

31 Comments

Related Content

Sign up for the KDAB Newsletter

How to use static analysis to improve performance

31 Comments

Related Content

10 Tips to Make Your QML Code Faster and More Maintainable

Introducing the ConnectionEvaluator in KDBindings

Fast Duplicate Tracking

Sign up for the KDAB Newsletter