Sign up for the KDAB Newsletter
Stay on top of the latest news, publications, events and more.
Go to Sign-up
Find what you need - explore our website and developer resources
In C++20, the standard library introduced new synchronization primitives: std::latch and std::barrier. These are the utilities designed to coordinate between concurrent threads.
What is a synchronization primitive?
In concurrent programming, synchronization primitives are the fundamental tools that help in managing the coordination, execution order, and data safety of multiple threads or processes that run concurrently.
Briefly said, they ensure that:
There are multiple synchronization primitives in C++; for example, mutual exclusion, condition variables, atomic operations, locking mechanisms, etc.
In C++20, we have two additional synchronization primitives: latches and barriers.
Let's discuss both of them.
A std::latch is a synchronization primitive that permits a certain number of count_down operations (decrements) before allowing one or more threads to pass the wait point. A latch cannot be reused once its internal counter reaches zero.
How do we use a latch?
Here's an example:
#include <latch>
#include <thread>
#include <iostream>
#include <vector>
#include <syncstream>
std::latch latch(3);
void worker(int id) {
// Simulating some work
std::this_thread::sleep_for(std::chrono::milliseconds(id * 100));
std::osyncstream(std::cout) << "Worker " << id << " reached the latch.\n";
latch.count_down();
}
int main() {
std::vector<std::jthread> threads;
for (int i = 1; i <= 3; ++i)
threads.emplace_back(worker, i);
latch.wait();
std::cout << "All workers reached the latch.\n";
}
In this example, three worker threads decrement the latch and the main thread waits until all worker threads are done.
Here's a possible output:
Worker 1 reached the latch.
Worker 2 reached the latch.
Worker 3 reached the latch.
All workers reached the latch.
Note: The actual order of "Worker X reached the latch" messages may vary due to thread scheduling.
A std::barrier is another synchronization primitive, but it differs from a latch in that it can be reused. It's designed to make multiple threads wait until they all reach the barrier point and then proceed together. Once all threads have reached the barrier, they can all continue and the barrier can be reused for the next synchronization.
Here's a simple example:
#include <barrier>
#include <thread>
#include <iostream>
#include <vector>
#include <syncstream>
std::barrier barrier(3);
void worker(int id) {
// Simulating some work
std::this_thread::sleep_for(std::chrono::milliseconds(id * 100));
std::osyncstream(std::cout) << "Worker " << id << " reached the barrier.\n";
barrier.arrive_and_wait();
std::osyncstream(std::cout) << "Worker " << id << " passed the barrier.\n";
}
int main() {
std::vector<std::jthread> threads;
for (int i = 1; i <= 3; ++i)
threads.emplace_back(worker, i);
}
In this example, each worker thread reaches the barrier at different times but will only proceed once all threads have reached it. After all threads have passed the barrier, it can be used for synchronization again.
Here's a possible output:
Worker 1 reached the barrier.
Worker 2 reached the barrier.
Worker 3 reached the barrier.
Worker 1 passed the barrier.
Worker 2 passed the barrier.
Worker 3 passed the barrier.
Note: The actual order of messages may vary due to thread scheduling.
It's important to note that a barrier is reusable, unlike a latch.
To choose between latch and barrier, one needs to identify whether the synchronization point is a one time event (hence, making use of a latch) or a recurring synchronization event (therefore, using a barrier). Both primitives are flexible and handy for controlling the flow and coordination of threads in various concurrent situations.
While this post provides a high-level overview of std::latch
and std::barrier
, there are some additional points to consider when working with these primitives in real-world scenarios:
std::osyncstream
to safely write to the console from multiple threads. In practice, any shared resource (like standard output) should be properly synchronized to avoid race conditions.By understanding these synchronization primitives and their use cases, you can write more robust and efficient concurrent C++ programs.
About KDAB
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.
Stay on top of the latest news, publications, events and more.
Go to Sign-up
Learn Modern C++
Our hands-on Modern C++ training courses are designed to quickly familiarize newcomers with the language. They also update professional C++ developers on the latest changes in the language and standard library introduced in recent C++ editions.
Learn more
1 Comment
12 - Aug - 2025
Andrew Polar
Example detached from real life. I need to start 200 threads (assuming 256 core processor) and make them build matrix. When completed, I need to pass control to another 201st thread, which updates this matrix, and after update pass control to all 200 threads. And I need to repeat it in the loop 10 million times. This is how real life is, not as in your example. Here is my template and std::barrier failed miserably. std::atomic<int> thread_count{ 0 }; std::atomic<int> loop_count{ 0 };
void WorkerTest(int id, int loops) { int local_step = 0; while (local_step < loops) { //this is first concurrent block in each thread //printf("a%d-%d ", id, local_step); printf("-"); //std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
void SynchronizationTest() { clock_t start_application = clock(); clock_t current_time = clock();
}