Skip to content

Fixing bugs via lateral thinking

For today’s blog I would like to share with you the little adventure I had when fixing a very strange bug in Qt.

Pop quiz

Don’t think about this too much, just guess: what does this QML snippet print?

import QtQuick 2.0

QtObject {
    Component.onCompleted: {
        console.log("240000000000" == "3776798720");
    }
}

There are no JavaScript semantic tricks involved; and using either == or === does not change the result in any way.

Did you get it?

Pop quiz (2)

Here’s another one, this time in pure C++: what does this print?

     QJSEngine engine;
     QJSValue value("foo");

     QJSValue obj = engine.newObject();
     obj.setProperty("240000000000", value);

     QJSValueIterator it(obj);
     while (it.hasNext()) {
         it.next();
         qDebug() << it.name() << it.value().toString();
     }

Right now you should be suspicious of that strange value…

Ready for it?

Solution

In Qt 5.6.2 and Qt 5.7.0 the QML test prints "true"! According to QML the two strings "240000000000" and "3776798720" are totally identical.

Moreover, the C++ test prints "3776798720" "foo", not "240000000000" "foo" as one could’ve imagined.

So, what is going on? Who told me to use JavaScript again?

The bug

This bug against the JavaScript engine of QML (internally called V4) was reported on Qt’s bugtracker a few days ago, and quickly gained a lot of votes (probably because it was also cross-posted to Stack Overflow).

So, what do we have here? A very popular bug report, with no assigner set, and myself who has been wanting for a while to peek into the internals of the V4 JavaScript engine… well, let’s go!

Analysis

Where do you start looking around in a codebase you’re totally unfamiliar with? The first step I took was trying to understand what values made the engine misbehave. Note that the reporter was very clever at noticing something strange with the values:

240000000000 == 0x37E11D6000
3776798720   ==   0xE11D6000

By the way, “240000000000” isn’t the only value failing — “240000000001” misbehaves too (returning “3776798721”).

So maybe I needed to look for some bit pattern. “Maybe Qt is storing these strings as 64 bit integers, and using the top bits to store some additional metadata, and the integer gets corrupted somehow”, I thought.

Then I wondered: is any big number (represented as a string) problematic? No, “300000000000” works, and “200000000000” works too.

Maybe it’s a range somewhere between those two extremes?

With a very quick binary search (using Perl to generate and run QML testcases), in a few steps I was able to determine the exact range that make everything fail: any string between “238609294222” and “249108103166” inclusive. Values before and after those two worked perfectly. Do these numbers share some bit pattern?

238609294221 == 0x378E38E38D   // biggest working before the problematic range
238609294222 == 0x378E38E38E   // smallest not working
249108103166 == 0x39FFFFFFFE   // biggest not working
249108103166 == 0x39FFFFFFFF   // smallest working after the problematic range 

Uhm, I can’t spot any bit pattern in there that could cause this. Indeed, this turned out to be a complete red herring.

Debugging through V4’s code

Of course, the next step is to try to step into the JavaScript engine and see if something is wrong. That turned out to be … problematic.

A JavaScript engine is not a trivial piece of code; there’s a lot of dark magic in there to give JavaScript semantics to values, and keep them allocated in efficient ways.

Most of the time, the debugger was not able to print any local whatsoever (even in a -O0 -g build), and when it was, the locals were totally meaningless, as I was probably in memory that was getting happily reinterpret_cast()ed across different internal datatypes. So, I simply was not able to understand at all which paths the code was going through (and why) in order to determine which one(s) could be problematic.

I realized that I needed a more structured approach at trying to find where the code paths diverged. That is (light bulb moment), I needed a way to trace the function calls and diff them. One way could’ve been: add printfs everywhere. Or: launch multiple debugging sessions, over different testdata, and advanced them in parallel to check where executions diverge.

But, in both cases, step by step would be a huge task, given the hundreds if not thousands of nested calls into the engine for something as trivial as setting a string property. We can do better by instrumenting the binaries and getting the traces generated for us.

Function tracing

There’s a ridiculously simple way to get traces for every function call: recompile your libraries/applications with -pg. If you also pass -O0, -g and other options that prevent the compiler from being too smart, you can gather extra useful data for each function call, such as the arguments passed to the functions, and the values returned.

It turns out that Qt’s configure script natively supports the -pg option via the -profile switch, so I simply recompiled both qtbase and qtdeclarative with it.

And now, the C++ testcase above generates accurate tracing data. How could I use them? I went for the simplest approach possible: let’s do a visual comparison.

Thanks to the awesome uftrace tool one can dump the C++ testcase’s trace straight into Chromium’s tracing flamegraph. There it is:

Visualization of C++ fucntion call traces into chromium.

Visualization of C++ function call traces into Chromium.

The trace in the top half is a working testcase, the one below a broken case.

Now the difference in behavior is quite evident!

In the broken case, QV4::Object::putIndexes gets called, instead of QV4::Object::put. Great! That was the hint I needed. From there on, it was kind of downhill.

Narrowing down the analysis

The point of divergence looked like this:

     ScopedString s(scope, engine->newString(name));
     uint idx = s->asArrayIndex();
     if (idx < UINT_MAX) {
         setProperty(idx, value); 
         return;
     }
     s->makeIdentifier(scope.engine);
     QV4::ScopedValue v(scope, QJSValuePrivate::convertedToValue(engine, value));
     o->put(s, v);

The idx < UINT_MAX check passes in the broken case, resulting in the indexed setProperty call (which in turn calls putIndexed), rather than the call to o->put at the end.

So what is that s->asArrayIndex() call anyhow? Well, my suspicion is that it’s an (important) optimization for accessing an object, using strings representing base-10 integers as keys. Arrays in JavaScript are objects that are indeed accessed this way (i.e. all arrays are actually associative arrays). So the engine tries to see if the string “looks like an integer”, and if so, uses some specialized storage (in a sparse array structure rather than a map).

But “240000000000” is an integer indeed, so what’s going on here?

asArrayIndex does this:

     uint asArrayIndex() const {
         if (subtype() == Heap::String::StringType_Unknown)
             d()->createHashValue();
         Q_ASSERT(!d()->largestSubLength);
         if (subtype() == Heap::String::StringType_ArrayIndex)
             return d()->stringHash;
         return UINT_MAX;
      }

I still have no clue about what those subtypes are, but I could spot the call to createHashValue() from the trace, therefore I knew it was getting called, therefore the string type was Unknown.

So, time to go back to a debugger, and check what value was calculated there — and that’s indeed the broken 3776798720 value.

What does createHashValue() do? It calls toArrayIndex, which calculates a “hash” for the string, and possibly changes its subtype (if it detects that it is a valid index).

And that’s the culprit. The function has the bug just sitting there:

         uint n = i*10 + x;
         if (n < i)
             // overflow
             return UINT_MAX;

That’s not a valid overflow check for multiplication. (It’s actually the check for detecting unsigned integer overflow for an addition.) That failing check was making the engine think it could treat such a big number as a 32-bit unsigned integer index into a sparse array. So, fix that, and job done!

Conclusions

It’s always rewarding to come up with the right idea at the right time to solve a very complex problem :).

In the meanwhile, the fix for this very strange bug has been submitted to Qt here, and it’s already merged in qtdeclarative (commit). The fix will be released in some future version of Qt; until then, talk to our consulting services if you need this issue fixed earlier.

In the meanwhile, happy hacking!

About KDAB

KDAB is a consulting company offering a wide variety of expert services in Qt, C++ and 3D/OpenGL and providing training courses in:

KDAB believes that it is critical for our business to contribute to the Qt framework and C++ thinking, to keep pushing these technologies forward to ensure they remain competitive.

FacebookTwitterLinkedInEmail

Categories: C++ / KDAB Blogs / KDAB on Qt / QML / Qt / QtDevelopment

Leave a Reply

Your email address will not be published. Required fields are marked *

By continuing to use the site, you agree to the use of cookies. More information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close