Skip to content

QStringView Diaries: The Eagle Has Landed QStringView merged for Qt 5.10

After two months of intensive reviews, discussions, fixes, and stripping down the initial commit, feature by feature, to make it acceptable, I am happy to announce that the first QStringView commits have landed in what will eventually become Qt 5.10. Even the docs are already on-line.

This is a good time to briefly recapitulate what QStringView is all about.

QStringView: A std::string_view for QString

If you never heard of std::string_view, you may want to learn about it in Marshall Clow’s CppCon 2015 presentation.

TL;DR: String-views reduce temporary allocations.

Yours truly is not generally known to support reimplementing std facilities in Qt. So you might legitimately ask: “Why QStringView? Why not just use std::basic_string_view<QChar>?”. The answer is the same as for QString itself. QString simply has a lot going for it that std::string is lacking. First and foremost, it has excellent Unicode support. So reimplementing std::string_view for QString/QChar is really a no-brainer.

QStringView tries to solve the problem that functions outside the very core of QString only take QString. There are usually not even QLatin1String overloads, even though most users pass just US-ASCII string literals to these functions. Sure, if you compile without QT_NO_CAST_FROM_ASCII, then just passing "foo" to a function taking QString works just fine.

But the use of QString has a cost: it allocates dynamic memory, and that is comparatively slow. For a string class, it has also fallen a bit behind the state of the art. It uses copy-on-write/implicit sharing, which developers outside Qt no longer consider an optimisation. It also does not use the small-string optimisation, which stores small strings in the object itself instead of in dynamic memory. That makes QString("OK") or QString("Cancel") much more expensive than it should be.

Enter QStringView

This is where string-views come in. QStringView is designed as a read-only view on QStrings and QString-like objects. QString-like are classes such as QStringRef, std::u16string, and char16_t literals (u"Hello"). This is useful, since a lot of functions that take QString do not need an actual QString. That is, they do not need an owning reference to the characters. They only need a weak reference: a non-owning pointer and a size, say. Or a pointer pair acting as iterators. And indeed, a lot of low-level functions take (const QChar* data, int length). In doing so, they do not require the construction of a QString just to iterate over its characters.

bool isValidIdentifier(const QChar *data, int len) {
    if (!data || len <= 0)
        return false;
    if (!data->isLetter())
        return false;
    --len;
    ++data;
    while (len) {
        if (!data->isLetterOrNumber())
            return false;
        ++data;
        --len;
    }
    return true;
}

Using pointer-and-length APIs has a cost, too, though.

Towards wide contracts in low-level string APIs

Such functions have preconditions. We say they have a narrow contract. Only certain combinations of the two parameters are allowed: The length must be non-negative, and the pointer mustn’t be nullptr unless the length is zero, too.

If a function takes a QString instead, it has no preconditions. We say it has a wide contract: any QString is generally acceptable, and valid.

QStringView combines the efficiency and QString-independence of pointer-and-length APIs with the conceptual clarity of QString APIs. By passing an object of class type, we can (and do) enforce invariants between these parameters. Constructing a string-view with a negative length is undefined behaviour. And that is caught at string-view construction time (with an assertion in debug mode). Before the function is entered. This way, we put the onus of checking for valid parameters on the caller. So far, nothing changed compared to the pointer-and-size case. But the function can now assume that its QStringView argument references valid data.

Practically speaking, this means that functions taking QStringView can be marked as noexcept while functions that take pointer-and-size cannot. At least if you buy into the rule that narrow-contract functions mustn’t be noexcept (which both the standard and Qt libraries do).

bool isValidIdentifier(QStringView id) noexcept {
    if (id.isEmpty())
        return false;
    if (!id.front().isLetter())
        return false;
    for (QChar ch : id.mid(1)) {
        if (!ch.isLetterOrNumber())
            return false;
    }
    return true;
}

A (nearly) universal string-data sink

The most thrilling property of QStringView, however, is the wide variety of arguments with which you can construct one. Not only does it abstract away the container used to hold the character data: Whether your string data is stored in a QString, a QStringRef, a std::u16string or a std::u16string_view, QStringView won’t care. It also abstracts away the plethora of character types Qt uses. It does not distinguish between QChar, ushort, char16_t or (on platforms, such a Windows, where it is a 2-byte type) wchar_t. It swallows any of those without a cast:

bool isValidIdentifier(QStringView id);
isValidIdentifier(u"QString");                // OK
isValidIdentifier(L"QString");                // OK (on Windows only)
isValidIdentifier(QStringLiteral("QString")); // OK
QString fun = "QString::left()";
isValidIdentifier(fun.leftRef(7));            // OK
isValidIdentifier(u"QString"s);               // OK
isValidIdentifier(L"QString"s);               // OK (on Windows only)

QStringView does not completely replace QString as an argument type, however. There are some (expensive-to-convert) argument types QString allows, but QStringView doesn’t. Your QString function will happily accept a QChar or a QLatin1String, too. QStringView doesn’t. If you use QStringBuilder (as you should), then your QString function can be called with a QStringBuilder expression. QStringView only accepts this with a manual cast to QString: f(QString(expr)).

Future

By Qt 5.10, we’d like a QStringView which has most if not all of the const QString API. There are some notable exceptions we already know about: we will not add a split() method. One of the reasons to use a string-view is to enable zero-allocation parsing. The split() function, however, returns a dynamically-sized container of substrings. We intend to replace this functionality with a QStringTokenizer class. Taking the same arguments as QString::split(), it will have a container interface that allows you to plug it into a ranged for-loop:

QString s = ...;
for (QStringView part : QStringTokenizer(s, u'\n'))
    use(part);

We will also co-evolve QLatin1String together with QStringView, making QLatin1String as full-blown a view type for chars as QStringView is for QChars.

You can follow QStringView development on this blog and on Gerrit.

Stay tuned!

10 thoughts on “QStringView Diaries: The Eagle Has Landed”

  1. What’s the reason that makes passing a const QStringView& worse than passing it by value? Indirection?

  2. Interesting. I thought that a Qt’s equivalent to std::string_view is QStringRef class. It would be nice if you’d explain key differences between QStringRef and QStringView.

    1. Indeed, thanks for the suggestion.

      In all brevity: QStringRef cannot reference non-QString-backed data, because it holds a const QString*, a position and a length inside that string. QStringView, otoh, is just a pointer to the character data and a size, and thus agnostic to the owning container. It may, but does not have to be a QString.

      1. So extending QStringRef instead of introducing a new type would keep the Qt API cleaner. Have you considered it?

        1. QStringRef has certain guarantees (it’s stable under reallocations of it’s string()) that were specifically designed into it. If I were to re-use QStringRef for what QStringView is designed to solve, I would have to do the whole work as an almost-atomic operation between Qt 5 and Qt 6. And I’d still break existing out-of-tree users in the process. I wanted something that was possible to implement here and now, and less disruptive.

  3. Another question: can QStringView work with a kind of string where the data is not contiguous? Say that one needs to implement a text editor, and considers storing the edited text as a gap buffer, rope, sequence of lines, or whatever.

    I also had the question of whether this could work with QStringIterator, but I saw one commit that made use of it, so it seems that yes.

    Thanks.

    1. QStringView, like std::string_view, expects characters to be contiguous. It cannot represent a rope.

      QStringIterator is already ported to QStringView, yes, but since it already sported a (QChar*, QChar*) constructor, you could’ve passed (and can still pass) begin() and end() of a QStringView even if it wasn’t.

  4. Maybe it’s a stupid question, but… As far as I understand, the QStringView fixes performance issues with the QString. Then why just not fix the QString itself?

    1. A string-view is conceptually similar, if not identical, to the STL design of separating algorithms from containers by having containers provide, and algorithms work with, iterators. A function taking a string-view is an algorithm on characters. The string-view is the iterator pair, and which container the algorithm works on is abstracted. Only, because we’re working with a rather restricted set of value types and only contiguous memory, we don’t need to write our algorithms as template functions. A normal function taking QStringView will do, because const QChar* is always the iterator.

      As for fixing QString: There are many things that I’d like to see fixed in QString, and I’ve mentioned them in the article. But a string class needs to hold strings of arbitrary size. So it must (eventually) allocate memory, and own it. That makes QString a container and fundamentally different from a string-view.

Leave a Reply

Your email address will not be published. Required fields are marked *