Fixing bugs via lateral thinking

Lessons learned by digging into obscure QML bugs

21 December 2016

Pop quiz

Don't think about this too much, just guess: what does this QML snippet print?

import QtQuick 2.0

QtObject {
    Component.onCompleted: {
        console.log("240000000000" == "3776798720");
    }
}

There are no JavaScript semantic tricks involved; and using either == or === does not change the result in any way.

Did you get it?

Pop quiz (2)

Here's another one, this time in pure C++: what does this print?

QJSEngine engine;
     QJSValue value("foo");

     QJSValue obj = engine.newObject();
     obj.setProperty("240000000000", value);

     QJSValueIterator it(obj);
     while (it.hasNext()) {
         it.next();
         qDebug() << it.name() << it.value().toString();
     }

Right now you should be suspicious of that strange value...

Ready for it?

Solution

In Qt 5.6.2 and Qt 5.7.0 the QML test prints "true"! According to QML the two strings "240000000000" and "3776798720" are totally identical.

Moreover, the C++ test prints "3776798720" "foo", not "240000000000" "foo" as one could've imagined.

So, what is going on? Who told me to use JavaScript again?

The bug

This bug against the JavaScript engine of QML (internally called V4) was reported on Qt's bugtracker a few days ago, and quickly gained a lot of votes (probably because it was also cross-posted to Stack Overflow).

So, what do we have here? A very popular bug report, with no assigner set, and myself who has been wanting for a while to peek into the internals of the V4 JavaScript engine... well, let's go!

Analysis

Where do you start looking around in a codebase you're totally unfamiliar with? The first step I took was trying to understand what values made the engine misbehave. Note that the reporter was very clever at noticing something strange with the values:

240000000000 == 0x37E11D6000
3776798720   ==   0xE11D6000

By the way, "240000000000" isn't the only value failing -- "240000000001" misbehaves too (returning "3776798721").

So maybe I needed to look for some bit pattern. "Maybe Qt is storing these strings as 64 bit integers, and using the top bits to store some additional metadata, and the integer gets corrupted somehow", I thought.

Then I wondered: is any big number (represented as a string) problematic? No, "300000000000" works, and "200000000000" works too.

Maybe it's a range somewhere between those two extremes?

With a very quick binary search (using Perl to generate and run QML testcases), in a few steps I was able to determine the exact range that make everything fail: any string between "238609294222" and "249108103166" inclusive. Values before and after those two worked perfectly. Do these numbers share some bit pattern?

238609294221 == 0x378E38E38D   // biggest working before the problematic range
238609294222 == 0x378E38E38E   // smallest not working
249108103166 == 0x39FFFFFFFE   // biggest not working
249108103166 == 0x39FFFFFFFF   // smallest working after the problematic range

Uhm, I can't spot any bit pattern in there that could cause this. Indeed, this turned out to be a complete red herring.

Debugging through V4's code

Of course, the next step is to try to step into the JavaScript engine and see if something is wrong. That turned out to be ... problematic.

A JavaScript engine is not a trivial piece of code; there's a lot of dark magic in there to give JavaScript semantics to values, and keep them allocated in efficient ways.

Most of the time, the debugger was not able to print any local whatsoever (even in a -O0 -g build), and when it was, the locals were totally meaningless, as I was probably in memory that was getting happily reinterpret_cast()ed across different internal datatypes. So, I simply was not able to understand at all which paths the code was going through (and why) in order to determine which one(s) could be problematic.

I realized that I needed a more structured approach at trying to find where the code paths diverged. That is (light bulb moment), I needed a way to trace the function calls and diff them. One way could've been: add printfs everywhere. Or: launch multiple debugging sessions, over different testdata, and advanced them in parallel to check where executions diverge.

But, in both cases, step by step would be a huge task, given the hundreds if not thousands of nested calls into the engine for something as trivial as setting a string property. We can do better by instrumenting the binaries and getting the traces generated for us.

Function tracing

There's a ridiculously simple way to get traces for every function call: recompile your libraries/applications with -pg. If you also pass -O0, -g and other options that prevent the compiler from being too smart, you can gather extra useful data for each function call, such as the arguments passed to the functions, and the values returned.

It turns out that Qt's configure script natively supports the -pg option via the -profile switch, so I simply recompiled both qtbase and qtdeclarative with it.

And now, the C++ testcase above generates accurate tracing data. How could I use them? I went for the simplest approach possible: let's do a visual comparison.

Thanks to the awesome uftrace tool one can dump the C++ testcase's trace straight into Chromium's tracing flamegraph. There it is:

Visualization of C++ function call traces into Chromium.

The trace in the top half is a working testcase, the one below a broken case.

Now the difference in behavior is quite evident!

In the broken case, QV4::Object::putIndexes gets called, instead of QV4::Object::put. Great! That was the hint I needed. From there on, it was kind of downhill.

Narrowing down the analysis

The point of divergence looked like this:

ScopedString s(scope, engine->newString(name));
     uint idx = s->asArrayIndex();
     if (idx < UINT_MAX) {
         setProperty(idx, value);
         return;
     }
     s->makeIdentifier(scope.engine);
     QV4::ScopedValue v(scope, QJSValuePrivate::convertedToValue(engine, value));
     o->put(s, v);

The idx < UINT_MAX check passes in the broken case, resulting in the indexed setProperty call (which in turn calls putIndexed), rather than the call to o->put at the end.

So what is that s->asArrayIndex() call anyhow? Well, my suspicion is that it's an (important) optimization for accessing an object, using strings representing base-10 integers as keys. Arrays in JavaScript are objects that are indeed accessed this way (i.e. all arrays are actually associative arrays). So the engine tries to see if the string "looks like an integer", and if so, uses some specialized storage (in a sparse array structure rather than a map).

But "240000000000" is an integer indeed, so what's going on here?

asArrayIndex does this:

uint asArrayIndex() const {
         if (subtype() == Heap::String::StringType_Unknown)
             d()->createHashValue();
         Q_ASSERT(!d()->largestSubLength);
         if (subtype() == Heap::String::StringType_ArrayIndex)
             return d()->stringHash;
         return UINT_MAX;
      }

I still have no clue about what those subtypes are, but I could spot the call to createHashValue() from the trace, therefore I knew it was getting called, therefore the string type was Unknown.

So, time to go back to a debugger, and check what value was calculated there -- and that's indeed the broken 3776798720 value.

What does createHashValue() do? It calls toArrayIndex, which calculates a "hash" for the string, and possibly changes its subtype (if it detects that it is a valid index).

And that's the culprit. The function has the bug just sitting there:

uint n = i*10 + x;
         if (n < i)
             // overflow
             return UINT_MAX;

That's not a valid overflow check for multiplication. (It's actually the check for detecting unsigned integer overflow for an addition.) That failing check was making the engine think it could treat such a big number as a 32-bit unsigned integer index into a sparse array. So, fix that, and job done!

Conclusions

It's always rewarding to come up with the right idea at the right time to solve a very complex problem :).

In the meanwhile, the fix for this very strange bug has been submitted to Qt here, and it's already merged in qtdeclarative (commit). The fix will be released in some future version of Qt; until then, talk to our consulting services if you need this issue fixed earlier.

In the meanwhile, happy hacking!

Tags:

c++performance qml qt

About KDAB

The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.

Giuseppe D’Angelo

Senior Software Engineer

Senior Software Engineer at KDAB. Giuseppe is a long-time contributor to Qt, having used Qt and C++ since 2000, and is an Approver in the Qt Project. His contributions in Qt range from containers and regular expressions to GUI, Widgets, and OpenGL. A free software passionate and UNIX specialist, before joining KDAB, he organized conferences on opensource around Italy. He holds a BSc in Computer Science.

Fixing bugs via lateral thinking

Lessons learned by digging into obscure QML bugs

Pop quiz

Pop quiz (2)

Solution

The bug

Analysis

Debugging through V4's code

Function tracing

Narrowing down the analysis

Conclusions

Related Content

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services

Fixing bugs via lateral thinking

Lessons learned by digging into obscure QML bugs

Pop quiz

Pop quiz (2)

Solution

The bug

Analysis

Debugging through V4's code

Function tracing

Narrowing down the analysis

Conclusions

Related Content

10 Tips to Make Your QML Code Faster and More Maintainable

Introducing the ConnectionEvaluator in KDBindings

Supercharging VS Code with C++ Extensions

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services