关于同一原子变量的std :: memory_order_relaxed原子性 [英] std::memory_order_relaxed atomicity with respect to the same atomic variable

查看:105
本文介绍了关于同一原子变量的std :: memory_order_relaxed原子性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关内存顺序的cppreference文档说

The cppreference documentation about memory orders says

放松内存排序的典型用法是递增计数器,例如std :: shared_ptr的引用计数器,因为这仅需要原子性,而不需要排序或同步(请注意,递减shared_ptr计数器需要获取释放同步与析构函数)

这是否意味着宽松的内存排序实际上不会导致针对同一变量的原子性?但是,是否最终会导致相对于其他宽松的内存负载和/或compare_exchange的最终一致性?与std::memory_order_relaxed配对时,使用std::memory_order_seq_cst是查看一致结果的唯一方法?

Does this mean that relaxed memory ordering don't actually result in atomicity with respect to the same variable? But rather just results in eventual consistency with respect to other relaxed memory loads and/or compare_exchanges? Using std::memory_order_seq_cst would be the only way to see consistent results when paired with std::memory_order_relaxed?

我假设std::memory_order_relaxed对于同一变量仍然是原子的,但没有提供关于其他数据的负载和存储的任何其他约束.

I was under the assumption that std::memory_order_relaxed is still atomic with respect to the same variable but does not provide any other constraints about loads and stores with respect to other data.

推荐答案

您在询问一些问题,但是我将重点介绍典型的shared_ptr实现所使用的排序约束,因为我认为这涵盖了关键部分.您的问题.

You are asking a few things, but I'll focus on the ordering constraints used by a typical shared_ptr implementation because I think that covers the key part of your question.

相对于其适用的变量(或POD),原子操作始终是原子操作.对单个变量的修改将以一致的顺序对所有线程可见.
问题中描述了轻松的原子操作的工作方式:

An atomic operation is always atomic with respect to the variable (or POD) it applies to; modifications to a single variable will become visible to all threads in a consistent order.
The way relaxed atomic operations work is described in your question:

std::memory_order_relaxed相对于同一变量仍然是原子的,但未提供关于其他数据的加载和存储的任何其他约束条件

std::memory_order_relaxed is still atomic with respect to the same variable but does not provide any other constraints about loads and stores with respect to other data

以下是2种典型方案,可以省略对原子操作的排序约束(即,通过使用std::memory_order_relaxed):

The following are 2 typical scenario's whereby ordering constraints on an atomic operation can be omitted (i.e. by using std::memory_order_relaxed):

  1. 由于没有对其他操作的依赖性,或者如注释者所言,(..)不是涉及其他存储器位置的不变式的一部分,因此不需要进行内存排序.

  1. Memory ordering is not necessary because there are no dependencies on other operations, or as a commenter puts it, (..) is not part of an invariant involving other memory locations.

一个常见的示例是原子计数器,该计数器由多个线程递增,以跟踪特定事件发生的次数. 如果计数器表示不依赖于其他操作的值,则可以放宽增量操作(fetch_add).
我发现cppreference给出的例子不是很令人信服,因为shared_ptr引用计数确实有依赖性.即内存的值变为零后即被删除. 一个更好的示例是Web服务器仅出于报告目的跟踪传入请求的数量.

A common example is an atomic counter, incremented by multiple threads to keep track of the number of times a particular event has occurred. The increment operation (fetch_add) can be relaxed if the counter represents a value that has no dependency on other operations.
I find the example given by cppreference not very convincing because the shared_ptr reference count does have a dependency; i.e. memory is deleted once its value becomes zero. A better example is a web server keeping track of the number of incoming requests only for reporting purposes.

内存排序是必需的,但是不需要 即可使用排序约束,因为所需的同步已通过 (IMO可以更好地解释为什么可以放宽shared_ptr的引用计数增量,请参见下面的示例).
shared_ptr复制/移动构造函数只能在它具有复制/移动实例的(引用)实例的同步视图时调用(否则它将是未定义的行为) 因此,不需要其他订购.

Memory ordering is necessary, but there is no need to use ordering constraints because the required synchronization has already passed (IMO this explains better why shared_ptr's reference count increment can be relaxed, see example below).
The shared_ptr copy/move constructor can only be called while it has a synchronized view of the (reference to the) copied/moved-from instance (or it would be undefined behavior) and as such, no additional ordering is necessary.

下面的示例说明shared_ptr实现通常如何使用内存顺序来修改其引用计数.假设所有线程并行运行 sp_main发布后(shared_ptr参考计数为10).

The following example illustrates how memory ordering is typically used by a shared_ptr implementation to modify its reference count. Assume that all threads run in parallel after sp_main has been released (the shared_ptr reference count is then 10).

int main()
{
    std::vector<std::thread> v;
    auto sp_main = std::make_shared<int>(0);

    for (int i = 1; i <= 10; ++i)
    {
        // sp_main is passed by value
        v.push_back(thread{thread_func, sp_main, i});
    }

    sp_main.reset();

    for (auto &t : v)  t.join();
}

void thread_func(std::shared_ptr<int> sp, int n)
{
    // 10 threads are created

    if (n == 7)
    {
        // Only thread #7 modifies the integer
        *sp = 42;
    }

    // The only thead with a synchronized view of the managed integer is #7
    // All other threads cannot read/write access the integer without causing a race

    // 'sp' going out of scope -> destructor called
}

线程创建保证了make_shared(在main中)和sp的复制/移动构造函数(在每个线程内部)之间的关系发生在(线程间)之前. 因此,shared_ptr的构造函数具有同步的内存视图,并且可以安全地递增ref_count而无需其他排序:

Thread creation guarantees an (inter-thread) happens-before relationship between make_shared (in main) and sp's copy/move-constructor (inside each thread). Therefore, shared_ptr's constructor has a synchronized view of memory and can safely increment ref_count with no additional ordering:

ctrlblk->ref_count.fetch_add(1, std::memory_order_relaxed);

对于销毁部分,由于仅线程#7写入共享整数,因此不允许其他9个线程在不引起竞争的情况下访问相同的内存位置. 因为所有线程都在大约同一时间被销毁,所以这就产生了一个问题(假设早先调用了main中的reset). 并且只有一个线程将删除共享整数(一个将ref_count从1减到0的整数).
在删除整数之前,最后一个线程必须具有同步的内存视图,但是由于10个线程中有9个没有同步视图,因此必须进行其他排序.

For the destruction part, since only thread #7 writes to the shared integer, the others 9 threads are not allowed to access the same memory location without causing a race. This creates a problem because all threads are destructed at about the same time (assume reset in main has been called earlier) and only one thread is going to delete the shared integer (the one decrementing ref_count from 1 to 0).
It is imperative that the last thread has a synchronized memory view before it deletes the integer, but since 9 out of 10 threads do not have a synchronized view, additional ordering is necessary.

析构函数可能包含以下内容:

The destructor may contain something like:

if (ctrlblk->ref_count.fetch_sub(1, std::memory_order_acq_rel) == 1)
{
    // delete managed memory
}

原子ref_count具有单个修改顺序,因此所有原子修改都以某种顺序发生. 假设在ref_count上执行最后3次递减的线程(在此示例中)是线程#7(3→2),#5(2→1)和#3(1→0). 线程#7#5执行的两个递减的修改顺序都比#3执行的递减的顺序更早.
释放顺序变为:

The atomic ref_count has a single modification order and therefore all atomic modifications occur in some order. Let's say the threads (in this example) that perform the last 3 decrements on ref_count are thread #7 (3 → 2), #5 (2 → 1) and #3 (1 → 0). Both decrements performed by threads #7 and #5 come earlier in the modification order than the one performed by #3.
The release sequence becomes:

#7(商店发布)→#5(读-修改-写,无需订购)→#3(负载获取)

#7 (store release) → #5 (read-modify-write, no ordering required) → #3 (load acquire)

最终结果是,线程#7执行的释放操作与#3执行的获取操作同步,并且保证整数修改(通过#7执行) 发生在整数破坏之前(通过#3).

The end result is that the release operation performed by thread #7 has synchronized with the acquire operation performed by #3 and the integer modification (by #7) is guaranteed to have happened before the integer destruction (by #3).

从技术上讲,只有已访问托管内存位置的线程才必须执行释放操作,但是由于库实现者不了解线程操作, 所有线程在销毁后都会执行释放操作.

Technically, only the threads that have accessed the managed memory location have to perform a release operation, but since a library implementer does not know about thread actions, all threads perform a release operation upon destruction.

要最终破坏共享内存,从技术上讲,只有最后一个线程才需要执行获取操作,因此shared_ptr库实现者可以通过设置独立的篱笆来进行优化 只能由最​​后一个线程调用.

For the final destruction of shared memory, technically only the last thread needs to perform an acquire operation and therefore a shared_ptr library implementer could optimize by setting a standalone fence that is only called by the last thread.

if (ctrlblk->ref_count.fetch_sub(1, std::memory_order_release) == 1)
{
    std::atomic_thread_fence(std::memory_order_acquire);

    // delete managed memory
}

这篇关于关于同一原子变量的std :: memory_order_relaxed原子性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆