Qt Quick without a GPU: i.MX6 ULL on small embedded devices
With the introduction of the Qt Quick software renderer it became possible to use Qt Quick on devices without a GPU. We investigated how viable this option is on a lower end device, particularly the NXP i.MX6 ULL. It turns out that with some (partially not yet integrated) patches developed by KDAB and The Qt Company, the performance is very competitive. Even smooth video playback (with at least half-size VGA resolution) can be done by using the PXP engine on the i.MX6 ULL.
Together with The Qt Company we have been looking at the performance of Qt Quick on an i.MX6 ULL, the low end of the i.MX6 application processor family. It comes with a single Cortex-A7 core, on the development board clocked at 528 MHz, and has, besides the so called PXP (more on that later), no further graphics accelerator. The display bundled with the development board has a resolution of 480×272 pixels. Video playback hasn’t been tried with larger resolutions, but regular Qt Quick UIs work well on a larger 800×480 screen as well.
Before the introduction of the Qt Quick software renderer, running Qt Quick on systems without a GPU would have been infeasible, as back then Qt Quick required OpenGL. But nowadays one can use, for example, the non-accelerated Linux frame buffer as well, which is what we (mostly) did.
The main focus lay in analyzing and optimizing the CPU usage of text rendering and full screen Qt Quick animations. For this a few noteworthy patches were developed:
- Unneeded blending operations in the Qt Quick software renderer were removed in two places. Images without an alpha channel were always blended before, this patch fixes this. Layers were formerly also blended unconditionally, now it is checked whether a layer covers its area completely.
- Font drawing was improved. This includes removing unnecessary temporary allocations as well as simplifying the computation for glyph drawing in the special case when no gamma correction is used.
- Finally a prototype for a significant special case optimization of the linuxfb platform plugin. The plugin implements window compositing which, in the general case, requires a temporary compositing buffer. However, for the rather common special case of having only one visible window, this buffer can be skipped and the window directly copied to the frame buffer. For now, this prototype just assumes this special case, obviously this needs proper integration.
My involvement for these was primarily performance measurements on the device using perf and Hotspot. Besides this I focused on reducing the Qt library size as well as investigating the possibility to play back videos.
Minimizing Qt library size
Qt is quite modular and offers a wide range of options you can pass to the configure script of the base modules to adapt it to your needs. Relevant options that influence the Qt library size are obviously about enabling or disabling optional features. The default setting of most features is either enabled unconditionally, or enabled when the required third party library is available. My goal was to find a sensible collection of options for a fairly minimal Qt build that still supports real world embedded Qt Quick applications and does not change, based on the environment of available libraries.
Apart from the relatively coarse-grained, “traditional” feature options that are mostly described directly in the configuration script help text, a more fine-grained feature control mechanism named Qt Lite was introduced not that long ago. I chose not to use these for the most part, since with those it is much harder to know in advance which features are really needed, randomly missing API is confusing, and I was not concerned about going as far as possible. I see Qt Lite more as a special purpose tool for when every byte really counts.
Without using Qt Lite, the only hard third party library dependencies of Qt 5.9.1 (the version used throughout this blog) are zlib and pcre. But for a reasonable base for UI development at least text and image rendering should be supported, thus I also enabled freetype, harfbuzz, jpeg and png support.
Here is the list of options I came up with:
-system-zlib -qt-pcre -qt-freetype -qt-libjpeg -qt-libpng -qt-harfbuzz \ -no-cups -no-iconv -no-sm -no-feature-vnc -no-widgets -no-ico -no-gif -no-glib -no-gtk \ -no-sql-mysql -no-sql-psql -no-sql-sqlite -no-sql-sqlite2 \ -no-icu -no-openssl -no-fontconfig -no-dbus -no-qml-debug
Of course in many cases you would want to re-enable one or the other, this is more intended as a starting point. Also, the options regarding hardware integration are omitted as that is very project specific.
Finally for reducing file size it is advisable to use the options that control appropriate compiler settings:
With all these the file size sum of all installed Qt shared libraries (including the QML and Quick modules) comes out about 15 MiB. Compiling Qt statically and linking it into a basic Hello-World Qt Quick application produces a binary of about 8.3 MiB stripped. One interesting observation is that the link time optimization does indeed help noticeably; without it the dynamic libraries are around 800 KiB, the static binary 1.5 MiB larger.
Although the i.MX6 ULL does not have a GPU, it comes with the so called Pixel Pipeline, or PXP for short. It can do some basic operations, like color space conversion, blitting, scaling and blending. Since probably all popular video formats require a YCbCr to RGB color conversion as a final decoding step for playback and this would be a pretty CPU intensive task, PXP is worth using alone for that. Luckily, since the imx GStreamer plugin supports PXP, coding up a proof of concept was quite easy.
For those who know about the basics of GStreamer API and concepts, I integrated the GStreamer video decoding into a Qt Quick application by creating a simple video decoding pipeline, where at the end the imx PXP plugin converts the video frames to raw RGB images, which I then extract out of the pipeline with an appsink element and draw it using a simple custom QQuickPaintedItem.
Surprisingly, with this setup, smooth 25fps H.264 full screen playback is possible, at around 40% CPU usage. (Of course, full screen on the device is slightly less than half VGA resolution).
Other than video playback another idea for using the PXP came up during performance analysis. It turns out that for regular application UI rendering significant CPU time is spent copying the rendering result to the frame buffer. I hacked up the Qt Linux frame buffer platform plugin to make the PXP do this blitting, resulting in a few percent lower CPU load for rendering. In the future this idea could be taken even further when combining it with video playback by letting PXP also do the compositing of the video frame with the application UI.
We can help you make Qt work well on your constrained hardware platform too. Find out more…