Synchronization Primitives in C++20

std::latch and std::barrier

5 September 2024

std::latch

A std::latch is a synchronization primitive that permits a certain number of count_down operations (decrements) before allowing one or more threads to pass the wait point. A latch cannot be reused once its internal counter reaches zero.

How do we use a latch?

A std::latch object is created with an initial count.
Multiple threads can decrement this count using the count_down method.
Threads can call wait, which block until the internal count of the latch reaches zero.

Here's an example:

#include <latch>
#include <thread>
#include <iostream>
#include <vector>
#include <syncstream>

std::latch latch(3);

void worker(int id) {
    // Simulating some work
    std::this_thread::sleep_for(std::chrono::milliseconds(id * 100));
    std::osyncstream(std::cout) << "Worker " << id << " reached the latch.\n";
    latch.count_down();
}

int main() {
    std::vector<std::jthread> threads;
    for (int i = 1; i <= 3; ++i)
        threads.emplace_back(worker, i);

    latch.wait();
    std::cout << "All workers reached the latch.\n";
}

In this example, three worker threads decrement the latch and the main thread waits until all worker threads are done.

Here's a possible output:

Worker 1 reached the latch.
Worker 2 reached the latch.
Worker 3 reached the latch.
All workers reached the latch.

Note: The actual order of "Worker X reached the latch" messages may vary due to thread scheduling.

What are the use cases for std::latch?

Ensuring initialization is done: You might have a scenario where multiple threads perform initialization tasks. The main thread, or other worker threads, might need to wait until all the initialization tasks are done. In such a case, each initialization thread will count down the latch and other threads will wait until the latch reaches zero. For example: std::latch init_latch(3); // suppose there are 3 initialization tasks void init_task_1() { /*...*/ init_latch.count_down(); } void init_task_2() { /*...*/ init_latch.count_down(); } void init_task_3() { /*...*/ init_latch.count_down(); } void main_thread() { init_latch.wait(); // Proceed with the rest of the tasks only after initialization is done }
One time signal for multiple threads: Imagine a scenario where you want to signal multiple threads to start processing only after certain conditions are met (e.g., all resources are loaded). A latch can be used to achieve this.

std::barrier

A std::barrier is another synchronization primitive, but it differs from a latch in that it can be reused. It's designed to make multiple threads wait until they all reach the barrier point and then proceed together. Once all threads have reached the barrier, they can all continue and the barrier can be reused for the next synchronization.

Here's a simple example:

#include <barrier>
#include <thread>
#include <iostream>
#include <vector>
#include <syncstream>

std::barrier barrier(3);

void worker(int id) {
    // Simulating some work
    std::this_thread::sleep_for(std::chrono::milliseconds(id * 100));
    std::osyncstream(std::cout) << "Worker " << id << " reached the barrier.\n";
    barrier.arrive_and_wait();
    std::osyncstream(std::cout) << "Worker " << id << " passed the barrier.\n";
}

int main() {
    std::vector<std::jthread> threads;
    for (int i = 1; i <= 3; ++i)
        threads.emplace_back(worker, i);
}

In this example, each worker thread reaches the barrier at different times but will only proceed once all threads have reached it. After all threads have passed the barrier, it can be used for synchronization again.

Here's a possible output:

Worker 1 reached the barrier.
Worker 2 reached the barrier.
Worker 3 reached the barrier.
Worker 1 passed the barrier.
Worker 2 passed the barrier.
Worker 3 passed the barrier.

Note: The actual order of messages may vary due to thread scheduling.

What are the use cases for std::barrier?

Synchronizing iterative algorithms: In many iterative algorithms, especially those in parallel computing, all threads need to complete one iteration before any thread can start the next iteration. A barrier can be used to ensure all threads synchronize at the end of each iteration. For example: std::barrier iter_barrier(num_threads); void parallel_algorithm(int thread_id) { for (int i = 0; i < max_iterations; ++i) { // Do some parallel computation for this iteration iter_barrier.arrive_and_wait(); // Wait here until all threads complete this iteration } }
Periodic synchronization: A simulation where multiple entities (managed by different threads) need to periodically synchronize their states. A barrier can ensure that all the entities synchronize at regular intervals.
Initializing the parallel pipeline: In cases like when there's a pipeline of stages in data processing and each stage is handled by a separate thread, a barrier can ensure that all stages of the pipeline are set up and ready before the data starts flowing.

It's important to note that a barrier is reusable, unlike a latch.

To choose between latch and barrier, one needs to identify whether the synchronization point is a one time event (hence, making use of a latch) or a recurring synchronization event (therefore, using a barrier). Both primitives are flexible and handy for controlling the flow and coordination of threads in various concurrent situations.

Further Considerations

While this post provides a high-level overview of std::latch and std::barrier, there are some additional points to consider when working with these primitives in real-world scenarios:

Thread scheduling: The exact order of thread execution is not guaranteed and can vary between runs. The examples provided show possible outputs, but you may see different orderings in practice.
Console output: In the examples, we used std::osyncstream to safely write to the console from multiple threads. In practice, any shared resource (like standard output) should be properly synchronized to avoid race conditions.
Error handling: The examples don't include error handling for brevity. In production code, you should include proper error handling and potentially use exceptions where appropriate.
Performance considerations: While these primitives are useful, they do introduce synchronization overhead. In performance-critical applications, you should carefully consider the impact of introducing these synchronization points.

By understanding these synchronization primitives and their use cases, you can write more robust and efficient concurrent C++ programs.

Tags:

c++

About KDAB

The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.