Skip to content

On the Removal of toSet(), toList() and Others or "How Do I Convert a QList to QSet in Qt 6?"

(Apologies for the clickbait in the post title! But I’d really like people who are searching for solutions to read this.)

Between Qt 5.14 and Qt 5.15, my colleague, Marc Mutz, and I submitted a series of patches to Qt that added “range constructors” to the Qt containers. This brought Qt containers one step closer to…C++98 feature parity! 🙂

What’s a range constructor? It’s a constructor that takes an iterator pair (begin, end) and constructs a container by reading the corresponding range of elements:

// Creates a QList from the [begin, end) range
QList<int> intList( begin, end ); 

Usually the range is obtained from another container of a different type. This allows for converting a container to another one, for instance, converting a QList to a QSet:

QList<int> intList = ~~~;

// Creates a QSet from a QList
QSet<int> intSet( intList.begin(), intList.end() ); 

This approach works no matter what is the source container: it can be a QList, a QVector, a std::deque, and what not.

Some Qt containers also provided a similar functionality. For instance, in Qt 5, QList had the toSet() conversion function that returns a QSet with the same elements:

QList<int> intList = ~~~;

// Creates a QSet from a QList (in Qt 5)
QSet<int> intSet = intList.toSet();

The new constructors are a generalization of these conversion functions. They work with any source container, including the ones from the Standard Library, but also those from Qt that were simply lacking the conversion (did you ever notice that there isn’t a QVector::toSet()?).

Therefore, in a related set of changes, the conversion functions have been deprecated in Qt 5.15 and then removed in Qt 6.0.

This has made a lot of people very angry and been widely regarded as a bad move.

Wait, that’s another thing…

Anyway, on the web, you may find quite a bit of unhappiness about the change. There’s a Qt bug report that tries to collect this information, as well as some threads on the Qt mailing lists.

The common factor to the complaints was that the deprecation and subsequent removal looked entirely gratuitous. In other words, is it only because we don’t want “duplication” in the APIs that we are asking users to replace perfectly working code with code that is:

  • functionally identical (does the same thing),
  • behaviorally identical (it does the same thing in the same way),
  • and much more verbose and awkward to use (list.toSet() vs QSet(list.begin(), list.end()))?

That sounds silly at best; and certainly against Qt policies of offering nice-to-use APIs.

That’s why we…actually have nice things!

Qt has always had convenience methods on its containers. This is perfectly natural in Qt:

QList<int> intList = getAnswers();

if (intList.contains(42))
   std::cout << "The Answer has been found!\n";

Only C++23 will perhaps add something as convenient to the Standard Library:

std::vector<int> intVector = getAnswers();

if (std::ranges::contains(intVector, 42)) // if P2302 gets adopted
   std::cout << "The Answer has been found!\n";

In the meanwhile, this is the “common” way we do it today:

std::vector<int> intVector = ~~~;

if (std::find(intVector.begin(), intVector.end(), 42) != intVector.end())
   std::cout << "The Answer has been found!\n";

Quite a mouthful when compared to Qt’s way, right?

So if we have standard algorithms (like std::find), why do Qt containers still offer contains, indexOf, and so on? Should we get rid of them, too?

The answer, generally speaking, is no. Qt does not strive for minimality, it strives for convenience. And having those functions is convenient for end users.

The struggle between “easy to use,” and “hard to misuse”

Convenience must never be a trade-off for correctness. This is a given.

But convenience should also not let people misuse the APIs for the intended usage. Unfortunately, this is much harder to enforce when simply providing a correct API.

Case in point: toSet() has been introduced as a conversion function from QList to QSet. It does what it says, in the best way possible.

But how and where is toSet() actually used? Do people really need to convert lists to sets all the time?

Turns out that no, they don’t! Most of the time, the conversion has been an API abuse. For instance, the conversion has been used to remove duplicates from a QList:

QList<int> list = ~~~;

// Remove duplicates, and get back to the original list
// For the love of kittens, DON'T DO THIS!
list = list.toSet().toList();

Or similarly, it has been used to process elements uniquely:

QList<int> list = ~~~;

// Process elements, skipping duplicates. 
// This is OK-ish, but there are better solutions!
const QSet<int> set = list.toSet();
for (int i : set)
  process(i);

Or to generate an list of unique elements when joining multiple lists:

QSet<int> temp;
for (int i = 0; i < N; ++i) {
  temp += getList(i).toSet();
}
QList<int> result = temp.toList();

All of these usages have a common denominator: they abuse the container API instead of using an algorithm. There’s a better, algorithmic solution for most of these. To put in another way, very few of these really require the “convenience” provided by toSet(), and the few ones that do need it are easy to port away to the range constructors.

So why do people immediately jump to using the converting functions between containers?

As a C++ teacher, it’s a hard truth for me to digest. Here, I mostly see a failure at educating C++ developers.

Developers using toSet() have no faults. We teach them about containers. We tell them that QList is sequential, random access; that QSet has a random order and doesn’t allow for duplicates; and so on. We don’t teach them about algorithms, or at least not together with containers; we don’t teach about their importance, or we just don’t teach about them at all.

So, when faced with a very concrete problem, such as how do I remove duplicates from a QList?, developers immediately jump to QSet. Hey! It’s the thing that doesn’t allow duplicates in the first place! “So how do I go from a QList to a QSet? Oh, there’s converting function, toSet(). Job done; move on.”

The correct mental approach of getting rid of duplicates should be different. That is, the default go-to for an algorithmic problem should be an algorithm. Just searching on Google would reveal the correct algorithm to use.

Please note, the fundamental reason why I’d love to spread more knowledge about algorithms isn’t just about performance. Yes, removing duplicates via QSet will cost you an awful lot more when compared with a std::sort+std::unique approach:

  • CPU: hashing, allocating, copying elements around, destroying everything/deallocating
  • Memory: at least #(number of unique elements) memory allocations (remember, in Qt 5, QSet is implemented as a bucket of chains)

But I don’t want to go there. I’m sure that someone will come up with some specific numbers and datatypes for which the QSet approach will actually be faster. In the bigger scale, it’s the wrong approach, and we shouldn’t let users choose the wrong approach. They deserve better. We’re not doing them a favor by supporting APIs such as toSet() and not teaching them the alternatives.

That’s the real reason for deprecating toSet(). It’s an API which is too easy to misuse, and correctly serves just very few users.

Summary (or: TL;DR: give me the solutions!)

Here are a few quick recipes for going forward.

“I need to eliminate duplicates from a linear, sequential container (QList, QVector, std::vector, etc.).”

If you can reorder the elements in the container itself, then use an algorithm (technically, two: sort and unique; plus a call to the container’s erase), it is the fastest option available, at zero extra memory cost.

You can package this as a utility function:

std::sort(container.begin(), container.end());
container.erase(std::unique(container.begin(), container.end()),
                container.end()); 

If you cannot reorder the elements in the container, then use a different algorithm!

KDToolBox::DuplicateTracker<T> tracker;
const auto isDuplicate = [&tracker](const T &t) { return tracker.hasSeen(t); };
container.erase(std::remove_if(container.begin(), container.end(), isDuplicate),
                container.end());

KDToolBox::DuplicateTracker is a small utility class from KDToolBox, KDAB’s collection of miscellaneous useful C++ classes and stuff. It wraps a set and also uses monotonic_buffer_resource to improve the memory usage for small containers, avoiding allocations in those scenarios.

In either case, you may want to also call container.shrink_to_fit(); at the end, if you want to shed extra capacity.

“I need to process unique elements, skipping duplicates.”

Use KDToolBox::DuplicateTracker again:

KDToolBox::DuplicateTracker<T> tracker;
for (const T &t : container) {
  if (!tracker.hasSeen(t))
    process(t);
}

Of course, bear in mind that this may cost you the same size of container (if it just contains unique elements).

“I really, really need to convert a QList (or another container) to a QSet.”

You can use the ranged constructor:

QSet<int> set(list.begin(), list.end());

If you obtain the list from a function call, however, be sure NOT to call the function twice!

QList<int> getList();

QSet<int> set(getList().begin(), getList().end()); // !!! WRONG !!!

Instead, do this:

QList<int> getList();

const QList<int> list = getList();
QSet<int> set(list.begin(), list.end()); // OK

The double call is expensive and, also, wrong (you’re getting potentially two distinct containers back).

“It’s still me. I really, really need to convert a QList (or another container) to a QSet. AND I HATE THIS SYNTAX!”

If you can, add range-v3 to your project, and you will be able to do even better:

QSet<int> set1 = list | ranges::to<QSet>();
QSet<int> set2 = getList() | ranges::to<QSet>();

This is pretty sweet to type.

“It’s still me. I really, really need to convert a QList (or another container) to a QSet. range-v3 sounds sweet, but I can’t use it yet — slows down my compilation times too much, etc.”

Then use KDToolBox again. It features a minimal implementation of ranges::to:

QSet<int> set1 = list | kdToContainer<QSet>();
QSet<int> set2 = getList() | kdToContainer<QSet>();

FAQs

Could Qt have left these methods in Qt 6, although deprecated?

That is a very good question. I personally don’t have a good answer. The point of deprecated functionality is that, sooner or later, it will need to be removed, and Qt 6 looked like the perfect time to do so.

Why not add some generic Container::to<OtherContainer>() functions?

As I said, converting between containers is, in and of itself, a sketchy operation. There’s a way to do so; why should we support them with extra convenience? Does the potential for good usage outweigh the potential for misusage? The data so far shows that no, it doesn’t. But please, don’t let me discourage you from contributing these functions to Qt!

Thank you for reading, and I’ll see you next time.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

FacebookTwitterLinkedInEmail

Categories: C++ / KDAB Blogs / KDAB on Qt / Qt / Technical

Tags: / / /

17 thoughts on “On the Removal of toSet(), toList() and Others”

  1. Hi, while some of the code snippets look interesting, all of this is still much more complicated to read and maintain than code using the simple “toSet()” function. For most code written in Qt, it is more important that the code CORRECT, than it is that the code is PERFORMANT. Forcing users to think about algorithms in code paths that are not performance critical IMO means premature optimization, which, as some people claim, is the root of all evil.

    1. Giuseppe D'Angelo

      Hi,

      let me try to counter-argument a bit, sorry for nitpicking on your answer. 🙂

      Hi, while some of the code snippets look interesting, all of this is still much more complicated to read and maintain than code using the simple “toSet()” function.

      I’d say that’s partially true. However,

      1. a solution like ranges::to or kdToContainer isn’t that much more complicated or verbose to use (and please bear with me, if the documentation and Google hits hinted at those instead of toSet, you’d just be using those without wondering about “why there isn’t a toSet() function?” — I mean, is anyone asking why there isn’t a “toStdDeque()“? (And, very ironically, there is a QList::toStdList()…)
      2. If this is truly supposed to be a one-off in the code, then one can accept a tiny bit of extra verbosity.

      If usages of toSet() aren’t a one-off in the code, then this is no longer a matter of premature optimization; this is really optimization.

      For most code written in Qt, it is more important that the code CORRECT, than it is that the code is PERFORMANT. Forcing users to think about algorithms in code paths that are not performance critical IMO means premature optimization

      As I’ve tried to argue, this is a double edged sword, about the danger of offering an easy-to-misuse API.

      First and foremost, offering the convenience is, well, convenient; the downside is that such convenience gets used even in code that is supposed to be performant.

      Second, this risks bringing in a “death by thousand cuts”. There isn’t just premature optimization; there’s also premature pessimization, caused by making a poor choice whereas a better one would’ve costed maybe 5 more seconds of typing. The danger of premature pessimization is that if one has slightly inefficient code all over the place, there is absolutely no easy way for someone to improve the overall performances of an application; a profiler will not be able to pinpoint a few easy problems to fix. That’s because the inefficiency, eventually, becomes so thinly spread that finding the causes and fixing them becomes a super-massive investment (e.g. go around and change all of your container classes, string classes, and so on). I’d rather take the extra second before making a “worse” decision every time, rather than having to revisit the code N years down the line in a desperate search on how we can make the application boot in 1 second instead of 5.

      In a nutshell: the ultimate goal would be that the default easy-to-go choice should also be the most efficient one; that’s very hard to do with a “blatant”, possibly non-efficient choice available immediately.

      (Eventually, if one does down this path, this leads me to question whether Qt users would be better served by using Python or other languages where these conversion facilities are “first-class”, rather than C++? It’s an important question to ask, for Qt’s own survival.)

      Thanks for the comment!

      1. Thanks for the reply.

        It’s interesting that you bring in python.

        IMO, one of the great advantages of Qt (especially it’s container classes) is/was, that they are much easier to use than STL counterparts (at the price of less flexibility), and they can be programmed with a similar convenience as when programming in python. We’re operating on a continuum here, from best to worst performance, and worst to best convenience:
        * A custom container class optimized with custom algorithms for your specific usecase (most performant, least convenient)
        * STL containers using STL algorithm and all the template specialization possibilities (performant, more convenient)
        * Qt containers (less performant, even more convenient)
        * python (worst performance, highest convenience, bad long-term maintainability)

        It should be everybody’s own choice which level of performance and convenience they want to use, but an advantage of CONVENIENT Qt containers is, that you can get the best from both worlds: the convenience of python, but the performance of C++, if you can find the performance critical paths and optimize them at a later point.

        1. Giuseppe D'Angelo

          Hi,

          Well, in the blog post itself I specifically argue that yes, Qt containers do offer some extra convenience, and I’m completely fine with it. What I’m less fine with is when this extra convenience starts falling on the verge of “easy to use, easy to misuse”. Something like indexOf() doesn’t fall there, something like toSet() does (IMNSHO). The very fact that there’s a lot of reactions to its deprecation is also somehow a warning sign to me: does it mean that, in most codebases, it wasn’t a one-off usage that can be quickly fixed — and move on? Does it mean that it was extensively used? Doesn’t that warrant that the code should be at least checked for its algorithmic characteristics?

          Aside: the “bad long-term maintainability” of Python is a case of [citation needed]. And, “if you can find the performance critical paths and optimize them at a later point” is a tremendous effort. As I said in my comment before: if most of your code is ever so slightly “slow”, you won’t find them easily. This has historically led to quite some drastic decisions, like rewriting everything in another framework or another programming language. Do you want a data point? Qt for MCU, in order to cope with the small CPU/memory requirements, has a complete reimplementation of the Qt string/container classes, to be more efficient. That’s because it’s not about ONE specific container instance or string instance, it’s about the fact that ALL of them are slightly slow (because their implementation in “mainstream” Qt is). I fail too see why non-MCU users should not get the same benefits. (Ok, now the discussion is completely derailed, apologies. :-))

          Thanks again,

  2. Some say that “premature optimization is the root of all evil”, and by forcing people to think about algorithms and use more complex code, you’re forcing them to do exactly that.

    1. Why no list.uniqu() to ease the migration from all the toSet().toList() ? That would be very convenient as it allows for bulk search & replace

    2. Others say that algorithms express intent much better than all the ways people invent to avoid them. More expressive code is easier to read and easier to maintain, contains less bugs and generally is faster.

  3. Giuseppe, you said that the Qt policy is to have nice-to-use APIs, and yet you are here defending that people should use a non-discoverable (no auto-completion), much larger and error prone to use API. See the huge contradiction?

    The only way in which this would not be contradictory is because, as you said, you are a C++ teacher. You know what? Many people are not with Qt because C++, but in spite of C++.

    You will not drive people from C++ to Qt, but you will drive Qt people away from Qt with this kind of changes.

    1. Giuseppe D'Angelo

      Hi!

      Giuseppe, you said that the Qt policy is to have nice-to-use APIs, and yet you are here defending that people should use a non-discoverable (no auto-completion), much larger and error prone to use API. See the huge contradiction?

      Well, I also said that nice-to-use shouldn’t clash with easy-to-misuse. I do see the struggle, and find hard to draw the line. And I don’t claim I have all the answers. Thus, this blog post. 🙂

      The only way in which this would not be contradictory is because, as you said, you are a C++ teacher. You know what? Many people are not with Qt because C++, but in spite of C++.

      Well, this opens an interesting conversion. For instance, would users prefer to use Qt combined with another language (Python)?

  4. Have i read this correctly? Are you suggesting I replace toSet with

    std::sort(container.begin(), container.end());
    container.erase(std::unique(container.begin(), container.end()), container.end());

    1. I dont want c++ garbage in my beautiful qt code

    2. You cant be serious

    1. Giuseppe D'Angelo

      Hi!

      This sounds like an over-simplification of my argument, given the code you pasted doesn’t even do the same thing; so I’m not sure what you meant there…?

      PS: I am serious, and don’t call me Shirley.

  5. Thanks for making these kinds of performance improving choices! We’re working with an older code base that needs performance improvements and is suffering from “death by a thousand cuts” issues.

    These kinds of changes, along with articles like this, help give us a better idea of where to head!

  6. You could use these functions to replace the Qt functions

    template<typename T> QSet<T> fromList(QList<T> &list)
    {
        QSet<T> set(list.begin(), list.end());
        return set;
    }
    
    template<typename T> QSet<T> toSet(QList<T> &list)
    {
        QSet<T> set(list.begin(), list.end());
        return set;
    }
    
    1. Giuseppe D'Angelo

      Hi. Took the liberty of fixing formatting in the above, as the blog ate some code parts.

      Sure, the above “works”. But:

      1) fromList is such a generic term that shouldn’t be “reserved” by something that makes QSets out of it. Nowhere in its name it says that it’s building QSets.

      2) Again this funcionality is packaged in a *much* more generic and convenient form by ranges-v3, kdToContainer, and so on.

      QList<int> myList = ~~~;
      
      auto set1 = myList | kdToContainer<QSet>();
      

      3) I still struggle to understand why this specific functionality is much wanted. What is it about conversions from/to sets that deserves their own? Why not generic conversions?

      4) The signatures above are actually wrong — so don’t use that snippet as-is.

  7. I disagree that std::sort() + std::unique() is best way – for any large data set – e.g. 100M entries. I’d say most performant will still be myList.toSet().toList(); for making it unique as it’s O(N) algorithm compared to O(N * log N) which you need for sorting. Memory wise, it will not be best, but if you are interested in performance.

    Or do you have hard data that it’s the other case???

    1. Giuseppe D'Angelo

      Hi,

      Please be very aware of the hidden constants in the big-O notation, as well as the hidden amortized costs. Yes, creating a set from a sequence costs O(N) (but nitpicking: in the optimal case where all insertions are O(1). If there a lot of duplicates, insertions start costing a linear amount of time, possibly up to O(N²)). Hidden in that O(1) insertion cost there is a huge constant: the cost of allocating a piece of memory *per element* (at least that’s how QSet works in Qt 5, similarly to std::unordered_set). I’m not talking about the memory usage, I’m talking about CPU time. This cost (CPU time for O(N) allocations, O(N) deallocations) quickly trumps the sort+unique approach. For instance: https://quick-bench.com/q/Dk0bWszN-DVApEdcm4hZhS8-I2o

Leave a Reply

Your email address will not be published. Required fields are marked *