Qt Quick without a GPU: i.MX6 ULL

on small embedded devices

20 March 2018

Performance improvements

The main focus lay in analyzing and optimizing the CPU usage of text rendering and full screen Qt Quick animations. For this a few noteworthy patches were developed:

Unneeded blending operations in the Qt Quick software renderer were removed in two places. Images without an alpha channel were always blended before, this patch fixes this. Layers were formerly also blended unconditionally, now it is checked whether a layer covers its area completely.
Font drawing was improved. This includes removing unnecessary temporary allocations as well as simplifying the computation for glyph drawing in the special case when no gamma correction is used.
Finally a prototype for a significant special case optimization of the linuxfb platform plugin. The plugin implements window compositing which, in the general case, requires a temporary compositing buffer. However, for the rather common special case of having only one visible window, this buffer can be skipped and the window directly copied to the frame buffer. For now, this prototype just assumes this special case, obviously this needs proper integration.

My involvement for these was primarily performance measurements on the device using perf and Hotspot. Besides this I focused on reducing the Qt library size as well as investigating the possibility to play back videos.

Minimizing Qt library size

Qt is quite modular and offers a wide range of options you can pass to the configure script of the base modules to adapt it to your needs. Relevant options that influence the Qt library size are obviously about enabling or disabling optional features. The default setting of most features is either enabled unconditionally, or enabled when the required third party library is available. My goal was to find a sensible collection of options for a fairly minimal Qt build that still supports real world embedded Qt Quick applications and does not change, based on the environment of available libraries.

Apart from the relatively coarse-grained, "traditional" feature options that are mostly described directly in the configuration script help text, a more fine-grained feature control mechanism named Qt Lite was introduced not that long ago. I chose not to use these for the most part, since with those it is much harder to know in advance which features are really needed, randomly missing API is confusing, and I was not concerned about going as far as possible. I see Qt Lite more as a special purpose tool for when every byte really counts.

Without using Qt Lite, the only hard third party library dependencies of Qt 5.9.1 (the version used throughout this blog) are zlib and pcre. But for a reasonable base for UI development at least text and image rendering should be supported, thus I also enabled freetype, harfbuzz, jpeg and png support.

Here is the list of options I came up with:

-system-zlib -qt-pcre -qt-freetype -qt-libjpeg -qt-libpng -qt-harfbuzz \
-no-cups -no-iconv -no-sm -no-feature-vnc -no-widgets -no-ico -no-gif -no-glib -no-gtk \
-no-sql-mysql -no-sql-psql -no-sql-sqlite -no-sql-sqlite2 \
-no-icu -no-openssl -no-fontconfig -no-dbus -no-qml-debug

Of course in many cases you would want to re-enable one or the other, this is more intended as a starting point. Also, the options regarding hardware integration are omitted as that is very project specific.

Finally for reducing file size it is advisable to use the options that control appropriate compiler settings:

-optimize-size -ltcg

With all these the file size sum of all installed Qt shared libraries (including the QML and Quick modules) comes out about 15 MiB. Compiling Qt statically and linking it into a basic Hello-World Qt Quick application produces a binary of about 8.3 MiB stripped. One interesting observation is that the link time optimization does indeed help noticeably; without it the dynamic libraries are around 800 KiB, the static binary 1.5 MiB larger.

PXP

Although the i.MX6 ULL does not have a GPU, it comes with the so called Pixel Pipeline, or PXP for short. It can do some basic operations, like color space conversion, blitting, scaling and blending. Since probably all popular video formats require a YCbCr to RGB color conversion as a final decoding step for playback and this would be a pretty CPU intensive task, PXP is worth using alone for that. Luckily, since the imx GStreamer plugin supports PXP, coding up a proof of concept was quite easy.

For those who know about the basics of GStreamer API and concepts, I integrated the GStreamer video decoding into a Qt Quick application by creating a simple video decoding pipeline, where at the end the imx PXP plugin converts the video frames to raw RGB images, which I then extract out of the pipeline with an appsink element and draw it using a simple custom QQuickPaintedItem.

Surprisingly, with this setup, smooth 25fps H.264 full screen playback is possible, at around 40% CPU usage. (Of course, full screen on the device is slightly less than half VGA resolution).

Other than video playback another idea for using the PXP came up during performance analysis. It turns out that for regular application UI rendering significant CPU time is spent copying the rendering result to the frame buffer. I hacked up the Qt Linux frame buffer platform plugin to make the PXP do this blitting, resulting in a few percent lower CPU load for rendering. In the future this idea could be taken even further when combining it with video playback by letting PXP also do the compositing of the video frame with the application UI.

To watch this video on our website please or view it directly on YouTube

We can help you make Qt work well on your constrained hardware platform too. Find out more...

Tags:

3d embedded hardware performance qml qt

4 Comments

So would it be possible to run Qt Quick apps on Replicant? Hardware acceleration is not supported in this system.

Sorry for late response, I was on vacation.

I'm no expert on the Android platform integration, so I don't know if it directly works with Replicant. Maybe I can get a colleague who knows more about it to answer, but it seems no one has tried Replicant here yet. Otherwise I'd suggest just trying it out.

Any chance you can post source code for linuxfb plugin mods?

Any chance for source code to above changes?

Thanks!

Zeno Endemann

Former KDAB employee

Zeno Endemann is a former KDAB employee

Qt Quick without a GPU: i.MX6 ULL

on small embedded devices

Performance improvements

Minimizing Qt library size

PXP

4 Comments

Related Content

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services

Qt Quick without a GPU: i.MX6 ULL

on small embedded devices

Performance improvements

Minimizing Qt library size

PXP

4 Comments

Related Content

FMA Woes

The Top 100 QML Resources for Developers

Wayland on Windows

Sign up for the KDAB Newsletter

Qt 5 to Qt 6 Migration Services