QML Engine Internals, Part 3: Binding Types

4 February 2013

Recap

To recap, let's quickly look at a simple binding:

text: "Window Area: " + (parent.width * parent.height)

Each binding like this one is actually a JavaScript function which is evaluated at runtime by the v8 engine. The result of the evaluation is the return value of the function, which is then assigned to the text property. v8 doesn't know about Qt's objects and properties, when encountering objects like parent or properties like width it asks the context wrapper and the object wrapper in QML to resolve them. These wrappers remember which properties were accessed while a binding was evaluated, and can therefore automatically connect to the changed signal (e.g. widthChanged()) of each property and connect it to a slot that re-evaluates the binding.

With the way bindings work now freshly in mind again, let's move on and have a look at the different binding types.

Binding Types

In the last post, I stated that each binding is represented by an instance of the QQmlBinding class. That was actually a lie-to-children. Having a full-blown QQmlBinding instance for each binding would be much to costly - there are hundreds if not thousands of bindings in a typical QML application, therefore a binding needs to be lightweight. In addition, each binding is compiled separately when loading a QML file, so there is a lot of overhead by invoking the v8 compiler many times during loading.

QV8Bindings

To resolve the large overhead of QQmlBinding, there is another binding class, confusingly named QV8Bindings. QV8Bindings is a collection of all bindings in a QML file, using an array of the much more lightweight QV8Bindings::Binding structure. The QML devs have gone to great lengths of minimizing the memory usage of this structure - they even exploit the fact that the last 2 bits of a pointer are unused because of alignment, and use that unused space to store flags (a common enough pattern that QML has a special-purpose class QFlagPointer for this). As a result, a QV8Bindings::Binding is only 64 bytes large.

The big advantage of QV8Bindings compared to QQmlBinding is that it compiles all bindings together, so only one v8 compiler invocation is needed. In QQmlCompiler::completeComponentBuild(), you can see that when compiling a QML file, all binding functions are concatenated together into one big JavaScript program, and stored in QQmlCompiledData (a structure that contains all kind of compiled data for each QML file). When the QML file is first instantiated, the v8 program is compiled, which happens in QV8Bindings::QV8Bindings(). The compiled program is then also stored in QQmlCompiledData and the original source is discarded. When instantiating the same QML file another time, the QML engine re-uses the QQmlCompiledData from before and does not need to compile the bindings program again. This is not the case with QQmlBinding, which need to be compiled each time a QML file is instantiated.
To sum up: Since QV8Bindings packs together all bindings of the same QML file, it uses much less memory for each individual binding and can compile all bindings together in one go.

So, where does that leave us QQmlBinding, why does this class even exist? In some cases, bindings are non-shareable, for example because they use closures or eval(). In this case, each binding function requires a different context, and can therefore not be compiled together with the other bindings that share the same context. Therefore, in these special and rare cases, a binding will get its own QQmlBinding instance instead. The decision on what binding type is used happens when compiling the QML file, in QQmlCompiler::completeComponentBuild(). There, a SharedBindingTester is used to check which bindings will be part of QV8Bindings and which will become their own QQmlBinding. SharedBindingTester is visitor for the JS AST. If you look at the code, you'll see that the SharedBindingTester also tests whether a binding is safe, which is used to avoid evaluating bindings multiple times when instantiating a QML file, which is best described in the commit message for this optimization.

To keep things in the QML code simple, both QQmlBinding and QV8Bindings::Binding inherit from QQmlAbstractBinding.

QV4Bindings

If you have looked at the QML engine code a bit, you will probably have noticed the class QV4Bindings, which is also a subclass of QQmlAbstractBinding. Yet another binding type? What is this one about? Like QV8Bindings, this is a collection of bindings of a QML file. Unlike QV8Bindings, QV4Bindings stores only so-called optimized bindings, also wrongly and confusingly called compiled bindings. Some bindings can be optimized, in which case they will be part of QV4Bindings, some bindings can not, and will be part of QV8Bindings.
So what is this optimization? v4 bindings are not evaluated by the v8 engine. Instead, v4 bindings are compiled to bytecode, and run through a bytecode interpreter. This bytecode compiler and interpreter can not deal with all JavaScript expressions, simply because ahead-of-time compilation of JavaScript is impossible for all cases.

But why bytecode? After all, the v8 engine compiles to machine code, isn't that faster than a bytecode interpreter? Turns out it isn't: the v8 engine has quite a bit of overhead when invoking it and when it needs to call out to the QML engine to resolve objects and properties. In addition, the v8 engine sometimes recompiles a function on the fly, with more optimizations, when it is called multiple times. All of this is too much overhead for the QML usecase, which typically are a lot of one-line binding functions. As a result of a benchmark I did for my DevDays talk I basically let the QML engine evaluate a binding a few hundred times. The binding was a simple one which the v4 compiler could deal with. To compare that to using the v8 engine, I used the environment variable QML_DISABLE_OPTIMIZER=1 to disable v4 bindings altogether.

As you can see, the v4 bytecode engine is indeed faster than v8 for this particular usecase.

Internally, v4 is a register machine. Much like a CPU, it has registers to store temporary values. Unlike a CPU, it does not load and store values from memory - instead it loads and stores values from QObject properties. Using the environment variable QML_BINDINGS_DUMP=1, let's have a look at a simple binding:

text: parent.width * parent.height

The output will be:

Program.bindings: 2
Program.dataLength: 92
Program.subscriptions: 4
     [SNIP of other, unrelated bindings)
     160        14:15:
     176                Block                   Mask(1)
     192                LoadScope               -> Output_Reg(0)
     208                FetchAndSubscribe       Object_Reg(0) Fast_Accessor(0x7f05f6e51060) -> Output_Reg(0) Subscription_Slot(1)
     272                FetchAndSubscribe       Object_Reg(0) Fast_Accessor(0x7f05f6e51090) -> Output_Reg(0) Subscription_Slot(2)
     336                LoadScope               -> Output_Reg(1)
     352                FetchAndSubscribe       Object_Reg(1) Fast_Accessor(0x7f05f6e51060) -> Output_Reg(1) Subscription_Slot(1)
     416                FetchAndSubscribe       Object_Reg(1) Fast_Accessor(0x7f05f6e510a0) -> Output_Reg(1) Subscription_Slot(3)
     480                MulNumber               Input_Reg(0) Input_Reg(1) -> Output_Reg(0)
     496                ConvertNumberToString   Input_Reg(0) -> Output_Reg(1)
     512                Store                   Input_Reg(1) -> Object_Reg(0) Property_Index(42)

As you can see, the properties width and height are loaded into register 0 and 1, then these registers are multiplied together, and the result stored in the text property (which happens to be property number 42 in the QQuickText class). The instruction FetchAndSubscribe not only loads a property, but also subscribes to its changed signal, which is needed for automatic binding updates to work. In the above "assembler" code, you can also see another advantage: The v4 compiler resolve objects and properties at compile-time, and stores the property index in the bytecode. Thus, at runtime, no name lookup is needed, the properties can be accessed directly by index. Contrast that to the v8 engine, which calls out to QML object and context wrappers to resolve object and property names, which of course is much more overhead. The disadvantage is that the v4 engine can not deal with dynamic objects, for example those exported from C++ via setContextProperty(). A binding containing such a dynamic object will be part of QV8Bindings.

Summary of Binding Types

To sum up, there are 3 binding types, all inheriting from QQmlAbstractBinding:

QV4Bindings::Binding
QV8Bindings::Binding
QQmlBinding

v4 bindings are fastest as they use a custom bytecode engine. Both QV8Bindings and QQmlBinding use the v8 JS engine for evaluation, however QV8Bindings packs together all bindings to compile them all in one go, and QQmlBindings are compiled individually and on each QML component instantiation.

Here is a (nonsensical) example with all binding types:

import QtQuick 2.0

Rectangle
    width: 360
    height: 360

    Text {
        anchors.centerIn: parent
        text: parent.width * parent.height
        font.pointSize: eval("14")
        font.wordSpacing: parent.width > 10 ? 90 : ~parent.width
    }
}

Using QML_COMPILER_DUMP=1, you'll see the QML compiler uses STORE_COMPILED_BINDING two times, STORE_V8_BINDING once and STORE_BINDING once as well.

STORE_BINDING is for QQmlBinding, it is used for font.pointSize, as that binding uses eval() and can therefore not be shared.

The bindings for anchors.centerIn and text are both v4 bindings (STORE_COMPILED_BINDING instruction, QV4Bindings::Binding class).

Finally, font.wordSpacing is an ordinary QV8Bindings::Binding (STORE_V8_BINDING instruction). The v4 bytecode compiler and interpreter is smart enough to deal with the tenary operator, but the complement operator is not yet implemented, therefore the QML compiler chose to use v8 bindings instead.

Update: Changed the links to the source from Gitorious to Woboq's (now KDAB's) excellent code browser for easier browsing

Read Part 4...

Tags:

design qml

8 Comments

The QML Profiler in Qt Creator actually shows which bindings are evaluated by v4, and which by v8: Check out the 'Type' section in the Events pane.

Thanks Thomas for the informative posts. It is very useful stuff for some bindings related stuff I'm working on at the moment.

I do have to say that a lot of this optimisation stuff in QML strikes me as being massive overkill and "micro-optimisations" at best. I don't see how shaving a few bytes off here and there in the QML binding classes and then adding more complexity really makes any sense. For example, the client company's logo in their real world QML app is likely to consume way more RAM than any micro-optimisations in the QML bindings are saving. And no one thinks twice about adding a logo to their app if they need one. Sometimes C++ developers lose sight of the bigger picture when they are worrying about a few bytes all the time.

I'd presume that those optimizations are done as a result of benchmarking on real devices. What you imply is that the developers are doing all this work with no purpose - I'd better had some strong arguments to back that up, if I was implying it myself. Remember that sometimes reducing memory consumption is not about memory consumption at all, but about cache utilization. It can make things faster even if the memory savings are immaterial.

What I see a lot in QML implementation in Qt is that every optimization taken in isolation yields a small improvement, "not worth it" you might say. Yet QML, initially, was suffering from performance death by a thousand cuts. Nothing by itself was too big of a performance killer. It wasn't possible to fix it without addressing each and every of those thousand cuts individually.

Getting rid of V8 and multiple copies of the same data will be significant. I'm sure it will reduce not only the data memory footprint, but also the executable memory footprint.

Simon, that statement is really a great oversimplification. If you can save a few bytes per object it doesn't sound much. But when you keep the law of large numbers in mind, then this will sum up to considerable savings. Especially on embedded hardware that might be the difference - especially if you want to use that memory for shiny graphic images.

Furthermore, making items smaller means more items can be stored in the CPU cache thus leading to immense performance benefits in many cases.

Thank you for writing this. It is really good to understand what happens behind the scene sometimes.

Milian, in a general sense what you say is correct, but we are talking about this specific case of QML bindings and saving a few bytes for objects in the order maybe maximum a thousand instances, adds up to KBs, not MBs or GB.s It is tiny, just minuscule in 2013 where people are running around with smartphones in their hands which now pack GBs of RAM by default.

What I want to see is more empirical work, i.e. measurements and testing, on real applications which put into proper context just how big these perceived problems are. These is no point is optimising memory or speed when is only accounts for <1% of the application's usage.

Where are the numbers? and what is the real context?

Hi Thomas, thanks for your post - it's a joy reading the QML Engine Internals series!

I had a similar discussion on the qt-project forums about QML binding performance, it might be a good additional information of this post for everybody interested: https://qt-project.org/forums/viewthread/22664/

Cheers, Chris