Full Stack Tracing, Part 3

Interpreting traces

16 December 2020

Text output

The first option is to use the babeltrace utility to make the CTF files human readable. Babeltrace converts between trace formats and it has a ton of options. However, the default options and implicit sinks and sources make it trivial to get a text version of a trace:

$ babeltrace2 ./mytracefile.ctf

Nevertheless, the output is quite dense and requires a bit of wading through to make sense of it:

[16:21:53.959371489] (+0.001033737) irq_handler_entry: { cpu_id = 4 }, { irq = 26, name = "ahci[0000:00:1f.2]" }
[16:21:53.959376537] (+0.000005048) block_rq_complete: { cpu_id = 4 }, { dev = 8388640, sector = 412553920, nr_sector = 192, error = 0, rwbs = 12 }
[16:21:53.959383667] (+0.000007130) kmem_cache_free: { cpu_id = 4 }, { call_site = 0xFFFFFFFF997A02D6, ptr = 0xFFFF8E8D764FB000 }
[16:21:53.959384148] (+0.000000481) kmem_cache_free: { cpu_id = 4 }, { call_site = 0xFFFFFFFF997A9023, ptr = 0xFFFF8E8D78778D00 }
[16:21:53.959384386] (+0.000000238) sched_switch: { cpu_id = 6 }, { prev_comm = "swapper/6", prev_tid = 0, prev_prio = 20, prev_state = 0, next_comm = "samegame", next_tid = 12 7718, next_prio = 20 }
[16:21:53.959385565] (+0.000001179) irq_handler_exit: { cpu_id = 4 }, { irq = 26, ret = 1 }
[16:21:53.959413189] (+0.000027624) qtcore:QObject_ctor: { cpu_id = 6 }, { object = 0x7FFD1B520660 }
[16:21:53.959419236] (+0.000006047) qtgui:QGuiApplicationPrivate_init_entry: { cpu_id = 6 }, { }
[16:21:53.959423690] (+0.000004454) qtcore:QCoreApplicationPrivate_init_entry: { cpu_id = 6 }, { }

...

[16:21:53.962944916] (+0.000023918) sched_switch: { cpu_id = 6 }, { prev_comm = "swapper/6", prev_tid = 0, prev_prio = 20, prev_state = 0, next_comm = "samegame", next_tid = 127718, next_prio = 20 }
[16:21:53.962968468] (+0.000023552) syscall_entry_openat: { cpu_id = 6 }, { dfd = -100, filename = "/usr/lib/locale/locale-archive", flags = 524288, mode = 0 }
[16:21:53.962971371] (+0.000002903) kmem_cache_alloc: { cpu_id = 6 }, { call_site = 0xFFFFFFFF9967EC4B, ptr = 0xFFFF8E87FDFBA000, bytes_req = 4096, bytes_alloc = 4096, gfp_flags = 6291648 }
[16:21:53.962976139] (+0.000004768) kmem_cache_alloc: { cpu_id = 6 }, { call_site = 0xFFFFFFFF9966FA13, ptr = 0xFFFF8E8874118D00, bytes_req = 256, bytes_alloc = 256, gfp_flags = 6324416 }
[16:21:53.962986535] (+0.000010396) kmem_cache_free: { cpu_id = 6 }, { call_site = 0xFFFFFFFF9966B8EB, ptr = 0xFFFF8E87FDFBA000 }
[16:21:53.962988213] (+0.000001678) syscall_exit_openat: { cpu_id = 6 }, { ret = 41 }
[16:21:53.962990485] (+0.000002272) syscall_entry_newfstat: { cpu_id = 6 }, { fd = 41
[16:21:53.962993755] (+0.000003270) syscall_exit_newfstat: { cpu_id = 6 }, { ret = 0, statbuf = 140443163546912 }
[16:21:53.962995661] (+0.000001906) syscall_entry_mmap: { cpu_id = 6 }, { addr = 0x0, len = 3035216, prot = 1, flags = 2, fd = 41, offset = 0 }
[16:21:53.963002217] (+0.000006556) kmem_cache_alloc: { cpu_id = 6 }, { call_site = 0xFFFFFFFF99480ADA, ptr = 0xFFFF8E88743F8480, bytes_req = 192, bytes_alloc = 192, gfp_flags = 6291648 }
[16:21:53.963007890] (+0.000005673) syscall_exit_mmap: { cpu_id = 6 }, { ret = 0x7FBB6E688000 }

It's not like you couldn’t figure this out, but a bit of visualization and computer-aided assistance could help immensely. This leads us to the next option.

Trace Compass

Provided by the Eclipse foundation, Trace Compass provides a much better way to see what’s in your trace files.

This tool has a lot of options for analyzing kernel and syscall events, as well as sporting a great user interface. While it’s possible to visualize custom trace points in the timeline, it’s very unwieldy to do non-trivial custom analyses within its XML framework. You would need to write a Java plugin for Trace Compass, which I personally don’t feel comfortable doing. (However, if someone else wants to volunteer to add the features that we describe below in ctf2ctf to Trace Compass, I’m sure the community would appreciate having more tools at their disposal!)

Chrome traces

The Chrome browser (using chrome://tracing) can also display traces in Chrome Trace Format, a JSON file format defined by Google. Although Google created Chrome Trace Format for Android and browser tracing, it’s extensible enough to handle other tracing applications. And of course, rather confusingly (but understandably), Google abbreviates this as CTF even though it has nothing to do with the other CTF we’ve been discussing.

How do you get your CTF (Common Trace Format) into CTF (Chrome Trace Format)? Simple! Use ctf2ctf, which is a KDAB-created utility that uses babeltrace to convert from LTTng files to a Chrome-compatible format. It also adds a number of metadata annotations to make the interpretation of your trace files even easier, like adding file names to file descriptors and page faults, showing device names with interrupts, converting string formats as needed, and counting memory page allocations.

After using ctf2ctf and opening the resulting file in Chrome, we get something like this:

Great! The biggest problem with Chrome trace viewing is that JavaScript is limited to 4GB of running memory. That means you are limited in the size of trace files you can view, something to be aware of if you load large files (100MB or larger) and the browser crashes or starts behaving abnormally.

Qt Creator

KDAB also started a project to add trace viewing capability to Qt Creator, which is available in the latest version of Qt. Like the Chrome viewer, this also imports Chrome JSON trace files from our ctf2ctf tool.

Although this feature is available now in the main Qt distribution, we’re still adding features and fixing; if you want to contribute, we would most welcome it.

Kernel areas to focus on

There are several key events that you need to look at to gain essential insights for interpreting a full stack trace. These let you understand the overall execution flow of the system and help you focus on an area of particular interest.

Scheduler trace points
(sched_process_fork, sched_switch, sched_process_exec, sched_process_free)
These are critical categories of trace points as they delineate application task switches: which process is running, when processes start and stop, and the currently running thread.
Syscalls
(openat, close, mmap, read, write, nanosleep, futex, poll, ioctl)
These are extremely important as they show when applications make OS calls. Not only do they help you correlate the trace to execution points within your code, they also give you a rough idea of the system interaction and explain various scheduler task switches. (And if you’ve been using ctf2ctf, the file descriptors will be annotated with file names, which is super handy for understanding what those file calls are actually doing.)
Block devices and interrupts
Traces on block devices and interrupts are useful to see when the system is I/O bound and to ensure that hardware is operating as expected.

System load

One of the more important uses of a full stack trace is to estimate system load. An unexpectedly heavy load can be very detrimental to the system and it can be quite difficult to identify in any other way.

Examining sched_switch events lets you check the total number of concurrently running threads and processes at any point. CPU frequency can be tracked across the CPU cores, telling you if each core is running flat out (and consuming lots of battery). CPU utilization gives you an estimate as to the overall cycle consumption of the system.

What about RAM? Tracking the kernel memory allocations (kmem_cache_alloc_allocated, kmem_cache_alloc_requested, and mm_page_alloc) shows you how much RAM is in use. If you’re using an x86 processor, watch the page faults. Paging can invalidate all kinds of timing assumptions if not taken into account. (You’ll want to also trace mmap and openat so you can see which file the addresses belong to.)

Interpreting full stack traces with Qt

When it comes to understanding Qt behavior, there are a large number of places that we instrumented useful to interpreting the system’s execution. QCoreApplication initialization, QLibrary loading, QDebug output, event handling, and signal/slot are all traced, as well as QImageReader reading, and QFontDatabase loading.

Note that, as you might expect, all QStrings written into the trace are UTF-16 encoded. That’s not a problem if you’re using ctf2ctf as it converts strings to UTF-8. It is something to be aware of, however, if you’re trying to use babeltrace or TraceCompass as they don’t understand UTF-16 format strings.

For QML applications, we’ve inserted tracepoints for creating QML object instances, QML V4 functions calls, binding execution, QML compilation, and signal handling. This covers a gamut of areas where QML code might need an extra bit of profiling.

And in QtQuick, you have the ability to monitor scene graphs, window and scene rendering, animations, texture uploads, and glyph caching.

With all of these areas traceable, hopefully you’ll be able to extract enough clues from your traces to find out where critical time is going and how to make your application perform better.

What’s next?

Is this the end for visualizing and interpreting full stack traces? There’s lots more we’d like to see. Enhancements we’re considering include better annotation of signals/slots, making the trace APIs public so you can use them in your code, overall improvements to the trace point coverage, and better analysis and visualization tools in Qt Creator. We’d also like to bring this capability to other Linux-like operating systems like Android, macOS, and iOS. Did we mention that if you’re interested in helping, we’re looking for contributors?

We want to help you too! We offer hands-on workshops, as well as a Debugging and Profiling for Linux training class if you’re looking for some help with interpreting your full stack trace project. Leave a comment on this blog if you need a recommendation on what might work best for your team.

Tags:

c++performance qml qt tools

About KDAB

The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.

Milian Wolff

Senior Software Engineer

Milian Wolff has a long history of creating tools for C++ developers. He’s the main author of Massif-Visualizer, heaptrack, hotspot and ctf2ctf tools now used widely to improve C++ applications performance. He’s a Senior Software Engineer at KDAB where he enjoys solving hard performance problems and teaching developers about debugging and profiling tools. Milian has a Masters Degree in Physics.

Full Stack Tracing, Part 3

Interpreting traces

Text output

Trace Compass

Chrome traces

Qt Creator

Kernel areas to focus on

System load

Interpreting full stack traces with Qt

What’s next?

Related Content

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services

Full Stack Tracing, Part 3

Interpreting traces

Text output

Trace Compass

Chrome traces

Qt Creator

Kernel areas to focus on

System load

Interpreting full stack traces with Qt

What’s next?

Related Content

10 Tips to Make Your QML Code Faster and More Maintainable

Introducing the ConnectionEvaluator in KDBindings

Fast Duplicate Tracking

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services