Can’t the Compiler Do That? Customized C/C++ Tooling with Clang LibTooling

12.05.2021

Anton Kreuzkamp and Kevin Funk

No comments

Note: This is the English translation of the article first published in German

Intro

The ESE Congress is one of the lead events for Embedded Software Engineering in Germany.

This year it was held digitally for the first time, so that participation was also possible via video. Over five days, there were 3 keynotes and 96 technical presentations from all areas of embedded software development.

Anton Kreuzkamp from KDAB talked about custom code refactoring with clang tooling. Keep reading, for our presentation of his contribution to the ESE conference proceedings.

Good static analysis can save a lot of effort and time. With customized static code analysis, the project code can be checked not only for general programming errors but also for project-specific conventions and best practices. The Clang Compiler Framework provides the ideal basis for this.

The programming language C++ manages the balancing act between maximum performance, which is essential in the embedded sector, on the one hand, and maximum code correctness through a high level of abstraction on the other. The balancing act is achieved by focusing on compile-time checks and opportunities for optimization. Calculations that could be carried out more efficiently by low-level code should not, where possible, be rewritten by the developer, but by the compiler. Additionally, errors should already be excluded during compilation, instead of taking up valuable computing time for checks at runtime.

Clang has become very popular in recent years and has long since established itself as one of the most important C and C++ compilers. This success is due not least to the architecture of Clang itself. Clang is not just another compiler, but a compiler framework. The essential parts of the compiler are a carefully designed library, thus enabling the diverse landscape of analysis and refactoring tools that has already emerged around the framework based on the LLVM project.

The command-line tool, clang-tidy, offers static code analysis and checks compliance with coding conventions, among other things, but can also refactor code independently. The clang-format tool can automatically standardize the coding style. The Clazy tool, which was developed by the author’s company, supplements the compiler with a variety of warnings around the Qt software framework and warns of frequent anti-patterns in the use of the same. Many other useful tools exist in the Clang universe, as well. Even integrated development environments, such as Qt Creator or CLion, rely on the Clang Compiler Framework for syntax highlighting, code navigation, auto-completion, and refactoring.

Anyone who knows the tools of the Clang world in their entirety is well positioned as a C or C++ developer. But if you want to get everything out of the technology, that is not the end of the story. The LibTooling library, on which most Clang tools are based, also allows you to create your own customized code analysis and refactoring tools, with little effort.

I’ll give you an example. A small but recurring piece of the puzzle of embedded software is the exponentiation of real numbers, mostly with static, natural exponents. Of course, the std::pow function would be used for this, had it not been determined in extensive profiling that on-the-target architecture std::pow(x, 4) is many times slower than x*x*x and forms a bottleneck in particularly performance-critical code. The senior developer of the project has therefore created a template function, usable as utils::pow<4>(x). And thanks to compiler optimizations, it’s just as nimble as the manual variant[1]. Nevertheless, since then the usual std::pow variant has crept in again at various places in the code, and even several hundred thousand lines of code have not been ported consistently.

The first attempt to automate the refactoring is, of course, the search and replace with a regular expression. std::pow\((.*), (\d+)\) already finds the simplest cases. But what about the cases where the “std::” is omitted or the second parameter is more complicated than an integer literal?

[1] Note: On many common platforms the same optimization can be achieved by using the compiler flag -ffast-math. The compiler will then independently replace the std::pow call with appropriate CPU instructions.

Installing LLVM and Clang

Those who cannot install Clang or LLVM via your trusted package manager can get the framework via Github. Prerequisites for a successful installation are Git, CMake, Ninja and an existing C++ compiler.

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build; cd build
cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
ninja
ninja check       # Test LLVM only.
ninja clang-test  # Test Clang only.
ninja install

The first steps

As a basis for our own Clang tool, we use a code example from the Clang documentation. Here, it’s reduced to the essentials. [1]

#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
#include "llvm/Support/CommandLine.h"

using namespace clang::tooling;
using namespace llvm;

int main(int argc, const char **argv) {
  CommonOptionsParser optionsParser(argc, argv,
    llvm::cl::GeneralCategory
  );
  ClangTool tool(optionsParser.getCompilations(),
                 optionsParser.getSourcePathList());

  return tool.run(
    newFrontendActionFactory<clang::SyntaxOnlyAction>().get()
  );
}

With this, we already have the first executable program. CMake makes it easy to create the necessary build scripts. All we have to do is find the Clang package and link our program to the imported targets clang-cpp, LLVMCoreund LLVMSupport:

cmake_minimum_required(VERSION 3.11)
project(kdab-supercede-stdpow-checker)

find_package(Clang REQUIRED)

add_executable(kdab-supercede-stdpow-checker
    kdab-supercede-stdpow-checker.cpp)
target_link_libraries(kdab-supercede-stdpow-checker
    clang-cpp LLVMCore LLVMSupport)

install(TARGETS kdab-supercede-stdpow-checker
    RUNTIME DESTINATION bin)

Using our development environment or the command line, we can now compile our program and run it against our code.

Before we test the newly created tool, it is recommended to install it into the same directory where the clang compiler is located (e.g. /usr/bin). This is because clang-based tools need some built-in headers, which they look for relative to their installation path, depending on the version, e.g. in ../lib/clang/10.0.1/include. In the analyzed code, for example, the header stddef.h would be missing. Those who get errors when starting the program have, in all probability, fallen into this trap.

$ cd /path/to/kdab-supercede-stdpow-checker
$ mkdir build; cd build
$ cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/usr ..
$ ninja
$ sudo ninja install
$ cd /path/to/our/embedded/project
$ kdab-supercede-stdpow-checker -p build/dir fileToCheck.cpp

So far, our tool checks the syntax of the C++ file and throws errors, if, for example, non-existent functions are called. Next, we want to find the code passages that cause our problem.

Find relevant code points with AST matchers

The AST, the Abstract Syntax Tree, is a data structure consisting of a multitude of classes with links to each other, which represents the structure of the code that will be analyzed. For example, an IfStmt links to an Expr object that represents the condition of an if statement and a Stmt object that represents the “then” or the “else” branch. An AST matcher can be thought of as a regular expression on the AST; it’s a data structure that represents and finds a particular pattern in the AST. AST matchers are programmed for Clangs LibTooling in a special syntax. For each type of language construct or node in the AST, there is a function that returns a matcher of the corresponding type. These functions, in turn, take as parameters other matchers that impose additional conditions on the code. Multiple parameters are treated as an AND operation. For instance, the following code snippet creates a matcher that matches function declarations that are called “draw” and have void as the return type.

auto drawMatcher
    = functionDecl(hasName("draw"), returns(voidType()));

This fits, for example, the following two declarations:

void draw();

void MyWidget::draw(WidgetPainter *p) {
    p->drawRectangle(…);
}

In order to be able to access the individual parts of the interesting code segment later, the sub matchers can be assigned names with a bind statement, which can then be used to reference the AST node that matches the matcher. For example, if we want to find function calls whose second argument is an integer literal and want to access this later, we can prepare this with the following matcher:

auto m =
  callExpr(hasArgument(1, integerLiteral().bind("secondArg")));

A complete list of all available matchers can be found at [2].

To speed up the creation of AST matchers, Clang comes with the command line tool clang-query, which can be used to interactively test matchers and inspect the found AST section. The enable output detailed-ast command enables the output of the AST section found by the AST matcher, and the match command creates and starts an AST matcher. The syntax used in clang-query is similar to the C++ syntax.

$ clang-query -p build main.cpp
clang-query> enable output detailed-ast
clang-query> match  callExpr(callee(functionDecl(hasName("pow"), isInStdNamespace())), hasArgument(1, expr()))

Match #1:

/path/to/ClangToolingTestApp/main.cpp:27:14: note: "root" binds here
    auto x = std::pow(pi, getExp(pi));
                             ^~~~~~~~~~~~~~~~~~~~~~~~ 
Binding for "root":                                                                                                                                                                          
CallExpr
|-ImplicitCastExpr <FunctionToPointerDecay> 
| `-DeclRefExpr Function 'pow'  (FunctionTemplate 'pow')                                                                                                      
|-ImplicitCastExpr 'double' <LValueToRValue>
| `-DeclRefExpr 'const double' lvalue Var 'pi'
`-CallExpr 'int' 
 |-ImplicitCastExpr <FunctionToPointerDecay> 
 | `-DeclRefExpr 'int (double)' lvalue Function 'getExp'
 `-ImplicitCastExpr 'double' <LValueToRValue> 
   `-DeclRefExpr 'const double' lvalue Var 'pi'


1 match.
clang-query> quit

The matcher can thus be refined interactively, piece by piece. For our goal of finding calls to std::pow which can be replaced by a call to the templated function utils::pow, the following matcher is goal-directed:

callExpr(
    callee(
        functionDecl(hasName("pow"), isInStdNamespace())
          .bind("callee")
    ),
    hasArgument(0, expr().bind("base")),
    hasArgument(1, expr().bind("exponent"))
).bind("funcCall");

This matcher finds function calls to std:: pow, if it has a second argument (index 1) that is an arbitrary expression. The name of the called function is “pow” and the function is defined in the namespace, std. We title the arbitrary expression “exponent,” the called function the “callee,” and the function call itself is “funcCall.”

Analysis, diagnosis and automatic code correction

. In order to be able to do something with the found code ranges, a MatchCallback must still be registered to the matcher. The callback is a class we will implement, which is derived from MatchFinder::MatchCallback and implements the method run(const MatchFinder::MatchResult &Result). This is where our analysis of the found code snippets takes place. In addition, we define a SupercedeStdPowAction class, which (in order to be able to apply our code fixes later) derives from the FixitAction class and contains both our MatchCallback and a MatchFinder, through which we can initiate the search of the AST. Finally, we replace the clang::SyntaxOnlyAction in the main function with our SupercedeStdPowAction.

class StdPowChecker : public MatchFinder::MatchCallback {
public :
    StdPowChecker() = default;

    void run(const MatchFinder::MatchResult &result) override {}
};

class SupercedeStdPowAction : public FixItAction {
public:
    SupercedeStdPowAction() {
        m_finder.addMatcher(stdPowMatcher, &m_stdPowChecker);
    }

    std::unique_ptr<ASTConsumer>
    CreateASTConsumer(CompilerInstance &, StringRef) override {
        return m_finder.newASTConsumer();
    }

public:
    MatchFinder m_finder;
    StdPowChecker m_stdPowChecker;
};https://www.kdab.com/cpp-with-clang-libtooling/

int main(int argc, const char **argv) {
  // [...]

  return tool.run(
      newFrontendActionFactory<SupercedeStdPowAction>().get()
  );
}

We now fill the function StdPowChecker::run with our actual check code. First, we can get the AST nodes as pointers using the names assigned to the sub matchers:

const CallExpr *callExpr
    = result.Nodes.getNodeAs<CallExpr>("funcCall");
const FunctionDecl *callee
    = result.Nodes.getNodeAs<FunctionDecl>("callee");
const Expr *base = result.Nodes.getNodeAs<Expr>("base");
const Expr *exponent = result.Nodes.getNodeAs<Expr>("exponent");

The objects obtained by this means provide extensive information about the entities they represent, for example, the number, names and types of the function parameters; the type and value-category (LValue-/RValue) of the expression; and the value of an integer literal. In addition to the value of a literal, the value of any expression can also be queried if it is known at compile time. In our case, we are interested in whether the second argument could also be in a template parameter. For this, the expression must be constexpr. exponent->isCXX11ConstantExpr(*result.Context) gives us the answer. If the answer is true, we know that utils::pow is applicable and the more performant alternative.

In order to issue a warning, as we know it from compiler warnings, we use the so-called DiagnosticsEngine, which we can access via the AST context:

auto &diagEngine = result->Context->getDiagnostics();
unsigned ID = diagEngine.getDiagnosticIDs()->getCustomDiagID(
    DiagnosticIDs::Warning,
    "std::pow is called with integer constant expression. "
    "Use utils::pow instead.");
diagEngine.Report(exponent->getBeginLoc(), ID);

If we want to not only to warn, but directly improve the code, we can add a so-called FixitHint to the report. In our case, we need to reorder the arguments of the function call. To do this, we need the code of the arguments as a string. This can be achieved with the following code:

auto &sm = result->Context->getSourceManager();
auto &lo = result->Context->getLangOpts();
auto baseRng
    = Lexer::getAsCharRange(base->getSourceRange(), sm, lo);
auto expRng
    = Lexer::getAsCharRange(exponent->getSourceRange(), sm, lo);
auto callRng
    = Lexer::getAsCharRange(callExpr->getSourceRange(), sm, lo);

auto baseStr = Lexer::getSourceText(baseRng, sm, lo);
auto expStr = Lexer::getSourceText(callRng, sm, lo);

From this, we can build a FixitHint by taking the character range of the function call as input and using the argument code to assemble the new code. We can pass the FixitHint created in this way via the stream operator to the diagnostic object that the DiagEngine.Report() call created earlier. llvm::Twine helps to assemble strings efficiently.

diagEngine.Report(exponent->getBeginLoc(), ID)
  << FixItHint::CreateReplacement(callRng,
     (llvm::Twine("utils::pow<") + expStr + ">(" + baseStr + ")"
     ).str());

The Practical Test

After putting all parts together and compiling the code, we would also like to test our result on actual code. To avoid making it too easy for Clang, we pass a macro and a call to a function to std::pow, each of which can be deduced to an integer constant. In addition, we alias the standard namespace and call std::pow via it.

#include <iostream>
#include <cmath>
#include "utils.h"

#define DIM 2
constexpr int getExp(double x) {
    return static_cast<int>(x);
}

namespace StdLib = std;
using namespace std;

int main() {
  constexpr double pi = 3.141596;

  std::cout << "(2Pi)^2 = " << std::pow(2*pi, DIM) << endl;
  std::cout << "Pi^3 = " << StdLib::pow(pi, getExp(pi)) << endl;
}

If the software we’re analyzing also uses CMake as a build system, then we can get it to create a so-called compilation database with the parameter, DCMAKE_EXPORT_COMPILE_COMMANDS=ON, which our Clang tool can use to get the necessary include paths and compiler flags. We pass this database to our tool by passing the build directory where we previously ran CMake as a parameter. If this is not available, we can manually pass the compiler parameters to the tool by appending double hyphens, followed by the compiler parameters, after the source files that will be analyzed.

$ cd /path/to/ClangToolingTestApp/build
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
$ cd ..
$ kdab-supercede-stdpow-checker -p build main.cpp -- -std=c++17

main.cpp:16:49: warning: std::pow is called with integer constant expression. Use utils::pow instead. 
   std::cout << "(2Pi)^2 = " << std::pow(2*pi, DIM) << endl;
                                ~~~~~~~~~~~~~~~^~~~ 
                                utils::pow<DIM>(2*pi) 
main.cpp:16:49: note: FIX-IT applied suggested code changes

main.cpp:17:47: warning: std::pow is called with integer constant expression. Use utils::pow instead. 
   std::cout << "Pi^3 =" << StdLib::pow(pi, getExp(pi)) << endl;
                            ~~~~~~~~~~~~~~~~^~~~~~~~~~~ 
                            utils::pow<getExp(pi)>(pi) 
main.cpp:17:47: note: FIX-IT applied suggested code changes 
2 warnings generated.

Conclusion

Putting all the pieces together, we have created a refactoring tool that is tailored to our project-specific needs, with just under 100 lines of code. Unlike a purely text-based refactoring tool, our implementation is capable of interpreting macros, aliases, and constexpr expressions. With Clang’s LibTooling as the foundation, the whole world of static code analysis and full code understanding is at our disposal. Via use of ASTContext, we have access to symbol tables. And with a single call to the CFG::buildCFG function, we can generate a control flow graph from the AST. The Preprocessor class allows us to inspect macro expansions and includes. In the other direction, clang::EmitLLVMOnlyAction gives us access to the LLVM Intermediate Representation, a language and machine independent abstraction of the generated machine code.

To get an overview of the capabilities of Clang’s internal libraries, the “Internals Manual” of the Clang documentation [3] is recommended. The complete code of the refactoring tool created in this article can be found at [4].

Bibliography

Author

Anton Kreuzkamp is a software developer at KDAB, where he develops, among other things, tooling for the analysis of C++ and Qt-based software and works as a trainer and technical consultant. KDAB is one of the leading software consulting companies for architecture, development and design of Qt, C++ and OpenGL applications on desktop, embedded and mobile platforms. KDAB is also one of the largest independent contributors to Qt. KDAB’s tools and extensive experience in building, debugging, profiling and porting complex applications help developers worldwide to realise successful projects.

Do you need similar support?

If you want to solve a similar problem in your software project, don’t hesitate to Contact us.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

Categories: C++ / KDAB Blogs / KDAB on Qt / Optimization / Technical / Tooling