QStringView Diaries: Advances in QStringLiteral How QStringView Development Also Improves its "Competition"
This is the first in a series of blog posts on QStringView, the
std::u16string_view equivalent for Qt. You can read about QStringView in my original post to the Qt development mailing-list, follow its status by tracking the “qstringview” topic on Gerrit and learn about string views in general in Marshall Clow’s CppCon 2015 talk, aptly named “string_view”.
What’s wrong with QStringLiteral?
First, it may fall back to
QString::fromUtf8(). That makes it all but impossible to recommend it as a fast way of creating a
QString: Creating it from a
QLatin1String would be faster than with
Second, each use produces a new UTF-16 array that contains the string data. This duplicates the data as many times as you “call”
QStringLiteral with the same argument. It does so even within a single translation unit. Common C string literals, on the other hand, are allowed to share a single memory location.
Third, since it returns an actual
QString, its use clutters the executable with calls to the
QString destructor. The destructor will be a no-op in all executions in of the program. But the dead code still sits there and costs you in binary size and reduced effective i-cache size.
What can we do about it?
At face value, not much, in Qt 5.
We can’t change
QStringLiteral to return something else than a full
QString. That would break code such as
For reasons that would go beyond the scope of this post, we also can’t enable string data sharing between
QStringLiteral instances before Qt 6. The key point here is:
QString::fromRawData() mustn’t allocate memory, which is not possible with the Qt 5
But we increased the minimum compiler requirements in Qt 5.7. That means we can do something about the unfortunate
QString::fromUtf8() fall-back: remove it.
Towards a noexcept QStringLiteral
For my work on
QStringView, I recently carefully analysed the
#ifdef jungle in
qcompilerdetection.h. This revealed that only one supported platform, QNX 6.x, still uses the
fromUtf8() fall-back. More importantly, I found that it shouldn’t.
To give you the gist of it: The compiler shipped with QNX 6 supports Unicode string literals:
const char16_t. But it ships with a standard library that lacks support for
char16_t. That means that, say,
std::u16string is not available. Qt C++ feature macros imply a certain level of standard library support as well as the core language feature. So we did not enable the macro for Unicode strings on that platform.
Now observe that, crucially, the
QStringLiteral implementation only needs the core language feature: it needs to be able to prefix
u to the C string literal you pass to
QStringLiteral. That turns the C string literal into a UTF-16 sequence that it then stores in a static object. The implementation does not need
std::u16string, or any other library support.
That leaves one supported platform without support for Unicode string literals: MSVC 2013. That, however, has an existing fall-back in place: it uses
wchar_t, which, on Windows, happens to be the same size as
So I prepared a patch that removes the check for Unicode string literals, uses
wchar_t on Windows and
char16_t everywhere else. It removes the
QString::fromUtf8() fall-back for good. I’m happy to report that it will be Qt 5.9. With a bit more attention paid to performance, it could have been in 5.7 already…
Remember, that patch effectively only changes a single platform: QNX 6. But it means that programmers can now safely assume that
QStringLiteral never allocates memory.
That said, if you find that the change breaks your platform, please file a bug so I can do something about it before 5.9.0 gets released.
Towards string sharing and less code bloat in QStringLiteral
The above does not address the problem of
QStringLiteral data duplication (point two in the introduction). As I hinted above, that needs a different
QString design, which can’t happen before Qt 6.
QStringLiteral allocates no memory anymore, it also means that references into the
QStringLiteral never expire. We can therefore lift the machinery for
QStringLiteral and use it to create a
QStringViewLiteral. That simply prefixes either L or u to the string, depending on the platform. In any case, the result is implicitly convertible to
QStringView, which will stay valid for as long as the program runs.
There is still the problem with DLL unloading that plagues
QStringLiteral, too. But while the problem potentially affects all
QString uses when
QStringLiteral is the source, no sane programmer would keep a
QStringView around for longer without storing it in a
QString to make a copy.
Advances in compiler support for C++11 enabled us to tighten the guarantees of
QStringLiteral: From Qt 5.9 on, it never allocates memory, and references into it never expire.
We cannot do something about
QStringLiteral‘s other drawbacks until Qt 6 allows us to change the
QString layout. But the introduction of
QStringView, hopefully in Qt 5.10, allowed me to implement a
QStringViewLiteral which has none of the drawbacks of
QStringLiteral. However, it “only” returns a
QStringView instead of a