C++23 Will Be Really Awesome

(a blog for April Fools Day)

1 April 2022

For Function Arguments

Consider a function like:

void f(int x);

Unfortunately, we all know too well that C++ allows you to call it with non-integer arguments:

f(1234);  // OK, intended: called with an int
f(3.14);  // Not really intended (called with double), still legal

If you're lucky, your compiler may give you a warning. Otherwise, the calls with non-int (double, long, ...) will be valid, and this may cause unexpected data losses and/or undefined behavior. Could it be possible to have a compile-time error instead?

C++11 and later gave us different strategies:

// C++11:
void f(int x);
template <typename T> void f(T) = delete;

f(42);   // OK
f(3.14); // ERROR

If you don't want to re-declare your functions twice, you can also use SFINAE in C++11 or, in C++20, concepts:

// C++11/14, SFINAE
template <typename T>
std::enable_if_t<std::is_same_v<T, int>> f(T x);

// C++20: concepts
void f(std::same_as<int> auto x);

While the C++20 version is a huge readability improvement over the SFINAE version, it's still a mouthful. Also, this makes the function a function template, so now it can't be out-of-line, it impacts our compilation times, etc.

In C++23 this will be possible by adding really:

void f(really int x);

f(42);   // OK
f(3.14); // ERROR

Simple and effective! You can also apply it to ordinary variables or to return types:

really int i = getInt();    // OK
really int j = getDouble(); // ERROR

As a Deeper `const`

Consider this code:

class String {
    const char *data;
    int len;

public:
    void a(int) const;
    void b() const;
    void c();
};

void String::c()
{
    a(len * 42);
    a(len * 42);
    b();
}

It may look a bit silly, but stay with me. The disassembly of c() (with optimizations enabled) is:

String::c():
        push    rbp
        imul    esi, DWORD PTR [rdi+8], 42
        mov     rbp, rdi
        call    String::a(int) const
        imul    esi, DWORD PTR [rbp+8], 42
        mov     rdi, rbp
        call    String::a(int) const
        mov     rdi, rbp
        pop     rbp
        jmp     String::b() const

The compiler is doing something quite peculiar here: it's multiplying len by 42, calling a() with the result of the calculation, then it multiplies len again in order to call a() a second time. How come it's doing the same (relatively) expensive operation twice, rather than simply storing the result and using it for both calls? a() is a const member function, after all, so it can't change len -- or can it?

Well, the short answer is that the compiler is not allowed to do otherwise: the value of len could've been indeed changed by the call to a(). Maybe a() ends up calling something else, and that something else changes *this. Or maybe a() is just doing a const_cast<String *>(this) and then mutating the result. It doesn't change the result: the compiler is forced to reload the data members after a member function call as it can't assume that they haven't been changed. The fact that the target member function is const does not help the compiler in any way.

However, if we declare a() as really const:

class String {
    const char *data;
    int len;

public:
    void a(int) really const;
    void b() const;
    void c();
};

Then we get this improved codegen:

String::c():
        push    r12
        push    rbp
        mov     rbp, rdi
        sub     rsp, 8
        imul    r12d, DWORD PTR [rdi+8], 42
        mov     esi, r12d
        call    String::a(int) const
        mov     rdi, rbp
        mov     esi, r12d
        call    String::a(int) const
        add     rsp, 8
        mov     rdi, rbp
        pop     rbp
        pop     r12
        jmp     String::b() const

It may look longer, but we have a couple of extra mov in lieu of an imul. The multiplication result is stored and reused for the second call. And that's a huge win!

What really const does is makes the compiler consider a work on an object declared const. Such objects cannot be mutated, not even by stripping their constness away (that's undefined behavior). This causes better codegen; now the compiler knows that calling a cannot possibly mutate *this.

Closed Enumerations

Suppose you have an enumeration like this:

enum class MyEnum {
  E1, E2, E3
};

Now, suppose you have a function that takes a value of this enumeration. Correctly, you use a switch over the value, in order to handle all the possible cases:

int calculate(MyEnum e)
{
  switch (e) {
  case MyEnum::E1: return 42;
  case MyEnum::E2: return 51;
  case MyEnum::E3: return 123;
  }
}

Now, for some reason, your compiler will complain about calculate. Specifically written like this, you will get a warning regarding the possibility for control to reach the end of the function without hitting a return statement, and the function does not return void.

How in the world is that possible? There's clearly a switch in there that is covering all the possible enumerators!

Defeated, you'll change your function to something like this:

int calculate(MyEnum e)
{
  switch (e) {
  case MyEnum::E1: return 42;
  case MyEnum::E2: return 51;
  case MyEnum::E3: return 123;
  // for the love of kittens, DO NOT add a default: !!!
  }

  assert(false);
  return -1; // impossible
}

(Yes, do NOT add a default: to the switch -- a default label will prevent the compiler from warning you that the switch is no longer covering all the enumerators, should you decide some day to extend the enumeration.)

So how is "the impossible" actually possible? Is the compiler wrong?

No, the compiler is right. There is the possibility of passing a value which isn't handled by the switch; and that's because C++ allows casting integers to enumerations, as long as the integer "fits" in the enumerator's range. See the wording in [expr.static.cast]:

That means that this is 100% legal C++:

void bad() {
  MyEnum nonsense = static_cast<MyEnum>(-1);
  int result = calculate(nonsense); // whops!
}

But just how often is this feature really useful? While, in general, allowing conversions from integer to enumerations makes sense, people are not really supposed to cast arbitrary integers into an enumeration's type.

Enter really enum, or, of course, enum really as it needs to be spelled:

enum class really MyEnum {
  E1, E2, E3
};

int calculate(MyEnum e)
{
  switch (e) {
  case MyEnum::E1: return 42;
  case MyEnum::E2: return 51;
  case MyEnum::E3: return 123;
  }
  // no warning, all cases are handled
}

void bad() {
  MyEnum nonsense = static_cast<MyEnum>(-1); // UB, -1 is not an enumerator
}

Basically, a really enum is just like an ordinary enum (or enum class, like in this case), except that converting integers to it requires the integer to match the value of one of the enumerators -- otherwise, undefined behavior. The UB is not really worse than what we had before (walking out of a function without returning) or crashing due to the failed assert. On top if that, thanks to things like ubsan, we can catch the problem (= the illegal conversion) at its source rather than later, when it causes havok.

For Deduced rvalue References

Let's face it: templates are hard. And deduction rules for function templates, combined with value categories, are even harder!

But let's start with a simple example. Any proficient C++ developer should know that it's possible to overload a function f for lvalues and rvalues:

void f(const std::string &s); // 1) lvalue reference
void f(std::string &&s);      // 2) rvalue reference

std::string str = "hello";
f(str);             // calls 1
f("hello"s);        // calls 2
f(std::move(str));  // calls 2

Now suppose you want to make your f generic, so it can work on std::string but also on other string types. That, of course, means turning f into a function template.

Easy enough, right? Well, raise your hand if you ever fell into this trap:

template <typename T> void f(const T &obj); // 1) lvalue reference
template <typename T> void f(T &&obj);      // 2) rvalue reference...?

The double ampersand still means rvalue reference, doesn't it? Turns out that, well, yes, but actually no. Since T is deduced, it doesn't; it now means forwarding (or universal) reference. This code:

std::string s = "hello";
f(s);  // call f on a lvalue

is now actually calling overload n. 2 (!!!), after deducing T = std::string &.

This is incredibly annoying and error-prone: the very same syntax, without deduced template arguments, works correctly. Reusing the same syntax with totally different meanings is a major annoying point (just think about the teachability of such a feature...).

In order to force the second overload only to take rvalue references, we have to again deploy SFINAE or similar techniques:

template <typename T> void f(const T &obj); // lvalue reference

template <typename T>
  std::enable_if_t<!std::is_reference_v<T>>
f(T &&obj);      // T must not be a reference => T&& is a rvalue reference

I mean... really?! (pun intended) I bet that if we had a time machine, someone would surely propose the usage of different syntax (a triple ampersand?) to specify forwarding references and remove the entire problem.

Luckily for us, in C++23, we can do this:

template <typename T> void f(const T really &obj);  // 1) lvalue reference
template <typename T> void f(T really &&obj);       // 2) rvalue reference! not forwarding reference

This brings back the "traditional" rules without any of the verbose SFINAE syntax. Technically speaking, it's not needed on 1), but I like the symmetry.

`Really Operator Auto`

A problem that sometimes affects Qt users is the following:

QString getString();
void print(QString);

void f() {
  auto result = QString("The result is: ") + getString();
  print(result);
}

What's the type of result? You may think it's a QString -- after all, it's being obtained by concatenating (via operator+) two QString objects.

That's actually correct in some cases, but in (many) others, it's not accurate. Generally speaking, result is actually an object of type QStringBuilder.

What's QStringBuilder, you ask? It's a private class in Qt that is employed to optimize string concatenations. A concatenation like this:

QString a, b, c, d;
QString result = a + b + c + d;

may cause up to 3 memory allocations, if done naively: first, calculate a + b (allocate); then, append c to the result (allocate again); then, append d to the final result (allocate again).

QStringBuilder instead considers the entire "sequence" of operations. When QStringBuilder is in use, a + b does not return a QString directly. It returns a QStringBuilder that remembers a and b (It simply internally stores references to them). Note that no allocations (for the result) or copies are taking place yet.

Then, appending c to the intermediate QStringBuilder yields another QStringBuilder that now remembers c, too. Finally, d is appended, yielding one final QStringBuilder.

This final object is then converted to a QString. That's where the magic kicks in: QStringBuilder can now perform one memory allocation of the right size (by asking all the other strings about their sizes, as it's keeping references to all of them), copy their contents in order to perform the concatenation, and produce result.

So what's wrong in the first example? The answer is the usage of auto:

auto result = QString("The result is: ") + getString(); // whops!

Here, a QStringBuilder is produced in order to concatenate the two strings. But it's never converted to a QString. Instead, result is an object of type QStringBuilder itself!

That is a problem because, as mentioned before, QStringBuilder merely references the strings that it is supposed to concatenate. These strings are temporaries that will get destroyed at the end of the statement. That means that we now have a QStringBuilder loaded with dangling references!

Again, with a time machine, the solution could be to allow like this:

class QStringBuilder
{
  // invoked when assigning the result to `auto` variable...?
  operator auto() const { return QString(*this); }
};

Although operator auto() exists, it's not what the comment above says. operator auto is nothing but an "ordinary" conversion operator, combined with auto deduction for the return type of a function. In other words, the above is just declaring an operator QString() const -- but we already have that one inside QStringBuilder!

Instead, in C++23 we can do this:

class QStringBuilder
{
  // invoked when assigning the result to `auto` variable.
  really operator auto() const { return QString(*this); }
};

Note that the position of really is important. Otherwise, it applies to the conversion target type, just like I've shown before:

class C
{
   operator really int() const;
};

C c;
int    i = c;    // OK
double d = c; // ERROR

And that's it! I hope you liked this little presentation about the new keyword. I can't wait to start working on C++23 projects to try this one out (and also the other goodies, which you can check out on cppreference).

If you want to know more about really, feel free to check out the original proposal here.

See you next time!

Tags:

c++qt

About KDAB

The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.

12 Comments

This is really great news! Really hope it really happens.

Happy April's Fools.

In the last committee meeting, I remember someone suggested renaming 'really' to 'indeed'. I hope this will be approved before next April.

You're missing the "honestly" proposal. Unlike really, honestly would also allow conversions between types that sound different but are honestly the same, in behavior and layout.

For example, on a MS Windows machine, a long int is honestly an int.

This would also allow conversions from char8_t, which is honestly an unsigned char.

Both "really" and "honestly" could coexist. And honestly, really and honestly would be nice to have.

I hate this date! ;)

Basically, a really enum is just like an ordinary enum (or enum class, like in this case), except that converting integers to it requires the integer to match the value of one of the enumerators — otherwise, undefined behavior.

Well that's already the case since MyEnum doesn't have a fixed underlying type (: int), as ubsan impolitely pointed out to me one day when casting a bit flag to an enum.

Well that’s already the case since MyEnum doesn’t have a fixed underlying type (: int), as ubsan impolitely pointed out to me one day when casting a bit flag to an enum.

MyEnum does in fact have a fixed underlying type because it's a scoped enumeration. See https://eel.is/c++draft/dcl.enum#5:

Each enumeration also has an underlying type. The underlying type can be explicitly specified using an enum-base. For a scoped enumeration type, the underlying type is int if it is not explicitly specified. In both of these cases, the underlying type is said to be fixed.

Didn't you mean std::enable_if_t<!std::is_lvalue_reference_v> instead of std::enable_if_t<!std::is_reference_v>?

Well spotted, and, pedantically, yes -- although usually the idea is to let the template type argument to be deduced (most things, like the std algorithms, don't want users to specify the template arguments).

Now I want this to be a thing. A really keyword would be really (ha!) infinitely handy.

Shouldn't syntax highlighting be improved so that 'really' is coloured as a keyword?

Absolutely. It already does, but it works only on April 1st, 2023...

Giuseppe D’Angelo

Senior Software Engineer

Senior Software Engineer at KDAB. Giuseppe is a long-time contributor to Qt, having used Qt and C++ since 2000, and is an Approver in the Qt Project. His contributions in Qt range from containers and regular expressions to GUI, Widgets, and OpenGL. A free software passionate and UNIX specialist, before joining KDAB, he organized conferences on opensource around Italy. He holds a BSc in Computer Science.