Skip to content

Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types

In this article, I will take a look at one of the fundamental concepts introduced in Alex Stepanov and Paul McJones’ seminal book “Elements of Programming” (EoP for short) — that of a (Semi-)Regular Type and Partially-Formed State.

Using these, I shall try to derive rules for C++ implementations of what are commonly called “value types”, focusing on the bare essentials, as I feel they have not been addressed in sufficient depth up to now: Special Member Functions.

Alex Stepanov and Paul McJones gave us a whole new way of looking at this, with a mathematical theory of types and algorithms quite unlike anything ever done before. Their achievement will forever change the way you look at computer programming, but eight years after its publication, the book still does not get the widespread adoption it deserves.

Setting The Stage

Special Member Functions, of course, are those member functions of a C++ object that the compiler can write for you: The default constructor, the copy and move constructors, the copy and move assignment operators and the destructor.

A Regular Type in EoP roughly corresponds to the EqualityComparable combined with the CopyConstructible C++ concept, see the book for more details.

A C++ Value Type is a type that is defined by its state, and its state alone (note that EoP has a very different definition of value type). Take an int as an example. Two int objects of value 5 will behave identical under all regular operations (simplified: all operations except for taking the object’s address). Two Shape objects, however, both having the same position, color, texture, … still may end up a square and a triangle when drawn on screen. A Shape object is defined by its behaviour as much as its state. We call such types polymorphic.

There are many shades of grey in between those two extremes; let’s leave it at that crude distinction. See Designing value classes for modern C++ – Marc Mutz @ Meeting C++ 2014 for a somewhat more thorough treatment.

In this article, we will look at two different classes, Rect and Pen, and try to write their Special Member Functions hopefully as Stepanov would have us do.

Rect and Pen

The first, Rect, is simple: it’s an integral-coordinate rectangle class that we will define completely inline in the header file. Pen, however, will be quite a bit different: It will use the Pimpl Idiom to firewall its internals from users. See Pimp My Pimpl and Pimp My Pimpl — Reloaded for more on the idiom.

class Rect {
    int x1, y1, x2, y2;
public:

};

class Pen {
    class Private; // defined out-of-line
    Private *d;
public:

};

The first task for today is to write the default constructor.

Default Construction

EoP has this to say about the default constructor:

[It] takes no arguments and leaves the object in a partially-formed state.

Ok, so what’s a “partially-formed state”? Here comes the good part:

An object is in a partially-formed state if it can be assigned-to or destroyed.

The authors go on to say that any other operation on partially-formed objects is undefined. In particular, such objects do not, in general, represent a valid value of the type.

The motivation for EoP to require default-construction in the first place is programmer convenience: T a = b; should be equivalent to T a; a = b;, and the user of the type should get to choose whether to write

T a;
if (cond)
   a = b;
else
   a = c;

or

T a = (cond) ? b : c;

Without default construction, if all the type’s author gave are user-defined constructors that establish a valid value, the programmer would have to use the ternary operator, whether or not that fits with line length limitations and personal preferences.

The comments at the end of the article contain even more reasons to support default construction.

A default constructor for Rect

So, let’s try write something for Rect:

class Rect {
    int x1, y1, x2, y2;
public:
    Rect() = default;
};

What do you think? Would you have written the Rect default constructor this way?

I can tell you I wouldn’t have. Not until EoP opened my eyes. Remember that EoP only requires that the default constructor establish a partially-formed state, not a valid value. This should not surprise you. When in C++, do as the ints do:

int x;
Rect r;

In both cases, any use of the default-constructed object other than assignment or destruction is undefined, because the values of the objects are undefined (uninitialised).

If you feel uncomfortable with this implementation, you’re letting your inner Java programmer get the better of you. Don’t. This is C++. We embrace the undefined.

And, as Howard Hinnant writes in a reddit comment on this article, we give power to our users:

int x = {};  // x == 0
Rect r = {}; // r == {0, 0, 0, 0}

Next, let’s try Pen.

A default constructor for Pen

class Pen {
    class Private; // defined out-of-line
    Private *d;
public:
    Pen() : d(nullptr) {} // inline
    ~Pen() { delete d; }  // out-of-line
};

Should we have left Pen::d uninitialised, too?

No. Doing so would make destruction undefined.

Should we have newed a Pen::Private object into Pen::d in the default constructor?

That would be a no, too. We’re not required to establish a valid value in the default constructor, so in the spirit of “don’t pay for what you don’t use”, we only do the minimal work necessary to establish a partially-formed state.

To hammer this one home: Should an implementation of

Colour Pen::colour() const;

check for d == nullptr?

No the third. You can see at a glance in the source code whether an object is in a partially-formed state. There is no need for a runtime check, except for debugging purposes.

From the above, it follows that your default constructors should be noexcept. If your default constructors throw, they do too much. Of course, we’re still talking Value Types here, so let no man say that yours truly told you to make the default constructors of your RAII types noexcept.

Move-Construction And Move-Assignment

For Rect, moving and copying are the same thing, and the compiler is in the best position to implement them for you:

class Rect {
    int x1, y1, x2, y2;
public:
    Rect() = default;
    // compiler-generated copy/move special member functions are ok!
};

Once more, Pen is a bit more interesting:

class Pen {
    class Private; // defined out-of-line
    Private *d;
public:
    Pen() noexcept : d(nullptr) {} // inline
    Pen(Pen &&other) noexcept : d(other.d) { other.d = nullptr; } // inline
    ~Pen() { delete d; }  // out-of-line
};

We put moved-from Pen objects into the partially-formed state. In other words: moving from an object has the same effect as default-construction. Can it get any simpler?

We delegate move-assignment to the move constructor:

class Pen {
    class Private; // defined out-of-line
    Private *d;
public:
    Pen() noexcept : d(nullptr) {} // inline
    Pen(Pen &&other) noexcept : d(other.d) { other.d = nullptr; } // inline
    Pen &operator=(Pen &&other) noexcept                          // inline
      { Pen moved(std::move(other)); swap(moved): return *this; } 
    ~Pen() { delete d; }  // out-of-line

    void swap(Pen &other) noexcept
      { using std::swap; swap(d, other.d); }
};

Note how all special member functions except the destructor are inline so far, yet we didn’t break encapsulation of the Pen::Private class.

Controversy

Thanks in no small part to the ISO C++ standard, which describes moved-from objects (in [lib.types.movedfrom]) as follows:

Objects of types defined in the C++ standard library may be moved from. Move operations may be explicitly specified or implicitly generated. Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state.

the simple chain of reasoning described so far has less friends than you might think. And this is why I wrote this article.

You will probably meet a lot of resistance when trying to implement your default and move constructors this way. But think about it: What would a natural “default value” of your type be?

It’s easy to fall for the next-best choice: For int, surely the default-constructed value should be zero, and we just have to put up with this partially-formed, nay: uninitialised, values because C sucks.

I disagree. If you are using the int additively, then, yes, zero is a good default value. But if you work with multiplication, then one would be the better fit.

Bottomline: for the vast majority of types, there is no natural default. If there isn’t, then having to establish a randomly-chosen one on every default-construction operation is wasteful, so don’t do it.

Instead, have the default constructor establish only a partially-formed state, and provide literals (or named factory functions for something more complex) for the different “default” values:

class Rect {
    static constexpr Rect emptyRect = {};
};

class Pen {
    static Pen none();
    static Pen solidBlackCosmetic();
};

Embracing Partially-Formed Objects

Partially-Formed Objects are nothing magical. They offer a simple description of the behaviour of C++ built-in types with respect to default construction, and of pimpl’ed objects with respect to move semantics, if implemented in the natural way.

In both cases, partially-formed objects are easily spotted in source code with local static reasoning, so demands for anything more fancy than the bare minimum as the result of moving from an object or default-constructing one are violating the C++ principle of “don’t pay for what you don’t use”. As a corollary, keep your default constructors noexcept.

In a future instalment, we will look at a smart pointer that encodes these guidelines for use as a pimpl-pointer.

6 thoughts on “Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types”

  1. Unfortunately, partially-formed object may not even be returned from functions (unless they’re returned as prvalues and C++17 mandatory copy-elision applies). This somewhat restricts the way you can subdivide your code into small functions. Using optional is a clunky but possible workaround. On the other hand, this guarantees that any T t = f(); produces a fully-formed state.

    The “local static reasoning” can easily be performed by static analyzers, if we can teach them which constructors produce a partially-constructed object and which ones don’t. The resulting checks are much better than initializing an object to some form of “default value” which might not be an appropriate default for all code paths. Dynamic analyzers are also able to find those issues.

    Regarding default-initialization for if/else vs ?: — I think that’s not a very strong argument. If it was the sole argument for a partially-formed state, then the safety concerns (use after lack of proper initialization) would far outweigh this minor benefit IMHO. However, it is not the sole argument. It is far easier to perform aggregation if you have a partially constructed state, since the class might not have a default value to provide to its data member. The other way around – taking away the requirement to always be constructed to a fully-formed state – is harder to implement (using unions or optional), especially if you want to have defensive checks like assertions. One annoying example are MSVC’s StdLib container classes, which allocate in their default constructors to implement debug iterators. If the allocator is stateful and needs that state to perform this debug allocation, then you have to pass it an allocator even if you never use the container.

    Consider std::thread and std::*fstream, which do not represent a thread or file, respectively, in their partially-formed state, and you can use them easily in your own types which might only sometimes spawn a thread or open a file. Many people I’ve talked to want their classes to provide as much guarantees as possible, and the “never-empty” guarantee is very popular; this corresponds to deleting the default ctor of std::thread, for example. This “never-empty” guarantee however cannot be uphold with efficient move operations, since the Committee decided against Stepanov’s idea of a copy-destructor (destructive move).

  2. Rather than the if/else rationale for the default-constructed case, consider the case of initialization from an I/O stream,

       Rect r;
       source >> r;
    

    and, of course, initialization of an array,

      Rect r[size];
      for (int i = 0; i < size; i++) r[i] = Rect(i, i, i+1, i+1);
    
    1. FTR: I was using the rationale given by Stepanov. I agree there are more reasons to allow default-construction, incl. the ones you gave.

  3. Very well explained.

    What do you think about not assignable types (like streams), should they be conditionally assignable if they are in a partially formed state? For example:

    Type& operator=(Type const& t){
    assert(d==nullptr);

    }

    Was the “future installment” on smart pointers ever published?

    1. A stream is a handle/raii-like type, like a mutex or a thread. It makes some sense to allow moves on these kind of types, even if copying is prohibited. It does force the representation to have external state, though, so a movable mutex class can’t just use atomic operations on *this anymore, say, like a non-copyable/non-movable mutex class could.

  4. Every day I am more convinced about embracing partially formed states. To the point that now I think containers should be allowed in a partially formed state if the contained value is allowed to be in a partially formed state. For example “std::vector v(20)” should contain objects in partially formed state (if “T t” is in a partially formed state). For formed state we always can do “std::vector v(20, {});” or “std::vector v(20, T{});”. In practice “std::vector v(20)” should call “std::uninitialized_default_construct”, and “std::vector v(20, {})” should call “std::uninitialized_value_construct”. Currently it is not possible to customize this behavior. (It is not the main reason, but “std::map” should also “default” initialize the Value keys.) What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *