栅栏在C ++ 0x,保证只是在原子或内存一般 [英] Fences in C++0x, guarantees just on atomics or memory in general

查看:167
本文介绍了栅栏在C ++ 0x,保证只是在原子或内存一般的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++ 0x草稿有一个概念的栅栏,这似乎非常不同于CPU /芯片级的栅栏概念,或者说linux内核家伙期望围栏。问题是,草案是否真的意味着一个极其有限的模式,或者措辞只是差的,它实际上意味着真正的围栏。



例如,在 29.8 Fences 它会说明如下内容:


释放栅栏A与
同步获取栅栏B b $ b操作X和Y,两者对
操作一些原子对象M,使得A是
,在X,X修改M之前排序,Y是
,在B之前排序,Y读取
由X写入的值或者由假设的
释放序列中的任何边e写入的值写入
的值如果
是发布操作,则将前进。


使用这些术语原子操作原子对象。草案中定义了这样的原子操作和方法,但是它只意味着那些? 不能保证在栅栏之前写入所有数据存储栅栏几乎无用。类似的负载(获取)围栏和完整的围栏。



所以,是在C + + 0x适当的栅栏和措辞的栅栏/ barries只是令人难以置信的差,






在C ++方面,假设我有这个现有的代码现在可用作高级结构 - 而不是说在GCC中使用__sync_synchronize):

 线程A:
b = 9;
store_fence();
a = 5;

线程B:
if(a == 5)
{
load_fence();
c = b;
}

假设a,b,c的大小为原子平台。上面的意思是 c 只会被分配 9 。注意,我们不关心线程B看到 a == 5 ,只是当它也看到 b == 9



C ++ 0x中的代码是什么保证相同的关系?






答案:如果您阅读我选择的答案和所有评论,您会得到情况的要点。 C ++ 0x似乎强迫你使用原子与栅栏,而正常的硬件围栏没有这个要求。在许多情况下,只要 sizeof(atomic< T>)== sizeof(T) atomic< T> .is_lock_free()== true



不幸的是, is_lock_free 不是一个constexpr。这将允许它在 static_assert 中使用。具有原子< T> 退化到使用锁通常是一个坏主意:使用互斥体的原子算法与互斥设计算法相比将具有可怕的争用问题。

解决方案

围栏在所有数据上提供订购。然而,为了保证从一个线程的屏蔽操作可见到一秒,你需要使用原子操作的标志,否则你有一个数据竞争。

  std :: atomic< bool>准备(假); 
int data = 0;

void thread_1()
{
data = 42;
std :: atomic_thread_fence(std :: memory_order_release);
ready.store(true,std :: memory_order_relaxed);
}

void thread_2()
{
if(ready.load(std :: memory_order_relaxed))
{
std :: atomic_thread_fence(std :: memory_order_acquire);
std :: cout<<data =<< data<< std :: endl;
}
}

如果 thread_2 读取准备 true ,则栅栏确保 data 可以安全地读取,输出将是 data = 42 。如果 ready 被读取为 false ,那么您不能保证 thread_1 已经发出适当的围栏,因此线程2中的围栏仍然不能提供必要的顺序保证 - 如果 if 在 thread_2 被省略,对数据的访问将是数据竞赛和未定义的行为,即使有栅栏。



澄清: std :: atomic_thread_fence(std :: memory_order_release)通常等同于商店围栏,并且可能会实现。然而,一个处理器上的单个围栏不保证任何内存排序:在第二个处理器上需要一个相应的围栏, AND 您需要知道当获取围栏被执行时,释放围栏的效果对于第二处理器是可见的。显然,如果CPU A发出获取栅栏,然后5秒钟后CPU B发出释放栅栏,则该释放栅栏不能与获取栅栏同步。除非你有一些方法来检查是否已经在其他CPU上发出了围栏,CPU A上的代码不能告诉CPU是否在CPU B上的围栏之前或之后发出了围栏。



您使用原子操作来检查是否已经看到围栏的要求是数据竞争规则的结果:您不能从没有顺序关系的多个线程访问非原子变量,因此,您不能使用非原子变量来检查排序关系。



当然可以使用更强大的机制,如互斥,但是会渲染单独的栅栏



轻松的原子操作可能只是简单的载入和存储在现代的CPU上,虽然可能有额外的对齐要求,以确保原子性。 / p>

如果用于检查同步的操作(而不是那些用于访问同步的操作),写入使用特定于处理器的栅栏的代码可以很容易地更改为使用C ++ 0x fences数据)是原子的。现有代码很可能依赖于给定CPU上的普通加载和存储的原子性,但是转换为C ++ 0x将需要对这些检查使用原子操作,以提供排序保证。


The C++0x draft has a notion of fences which seems very distinct from a CPU/chip level notion of fences, or say what the linux kernel guys expect of fences. The question is whether the draft really implies an extremely restricted model, or the wording is just poor and it actually implies true fences.

For example, under 29.8 Fences it states things like:

A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

It uses these terms atomic operations and atomic object. There are such atomic operations and methods defined in the draft, but does it mean only those? A release fence sounds like a store fence. A store fence that doesn't guarantee the write of all data prior to the fence is nearly useless. Similar for a load (acquire) fence and full fence.

So, are the fences/barries in the C++0x proper fences and the wording just incredibly poor, or are they exremely restricted/useless as described?


In terms of C++, say I have this existing code (assuming fences are available as high level constructs right now -- instead of say using __sync_synchronize in GCC):

Thread A:
b = 9;
store_fence();
a = 5;

Thread B:
if( a == 5 )
{
  load_fence();
  c = b;
}

Assume a,b,c are of a size to have atomic copy on the platform. The above means that c will only ever be assigned 9. Note we don't care when Thread B sees a==5, just that when it does it also sees b==9.

What is the code in C++0x that guarantees the same relationship?


ANSWER: If you read my chosen answer and all the comments you'll get the gist of the situation. C++0x appears to force you to use an atomic with fences whereas a normal hardware fence does not have this requirement. In many cases this can still be used to replace concurrent algorithms so long as sizeof(atomic<T>) == sizeof(T) and atomic<T>.is_lock_free() == true.

It is unfortunate however that is_lock_free is not a constexpr. That would allow it to be used in a static_assert. Having atomic<T> degenerate to using locks is generally a bad idea: atomic algorithms that use mutexes will have horrible contention problems compared to a mutex-designed algorithm.

解决方案

Fences provide ordering on all data. However, in order to guarantee that the fence operation from one thread is visible to a second, you need to use atomic operations for the flag, otherwise you have a data race.

std::atomic<bool> ready(false);
int data=0;

void thread_1()
{
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
}

void thread_2()
{
    if(ready.load(std::memory_order_relaxed))
    {
        std::atomic_thread_fence(std::memory_order_acquire);
        std::cout<<"data="<<data<<std::endl;
    }
}

If thread_2 reads ready to be true, then the fences ensure that data can safely be read, and the output will be data=42. If ready is read to be false, then you cannot guarantee that thread_1 has issued the appropriate fence, so a fence in thread 2 would still not provide the necessary ordering guarantees --- if the if in thread_2 was omitted, the access to data would be a data race and undefined behaviour, even with the fence.

Clarification: A std::atomic_thread_fence(std::memory_order_release) is generally equivalent to a store fence, and will likely be implemented as such. However, a single fence on one processor does not guarantee any memory ordering: you need a corresponding fence on a second processor, AND you need to know that when the acquire fence was executed the effects of the release fence were visible to that second processor. It is obvious that if CPU A issues an acquire fence, and then 5 seconds later CPU B issues a release fence, then that release fence cannot synchronize with the acquire fence. Unless you have some means of checking whether or not the fence has been issued on the other CPU, the code on CPU A cannot tell whether it issued its fence before or after the fence on CPU B.

The requirement that you use an atomic operation to check whether or not the fence has been seen is a consequence of the data race rules: you cannot access a non-atomic variable from multiple threads without an ordering relationship, so you cannot use a non-atomic variable to check for an ordering relationship.

A stronger mechanism such as a mutex can of course be used, but that would render the separate fence pointless, as the mutex would provide the fence.

Relaxed atomic operations are likely just plain loads and stores on modern CPUs, though possibly with additional alignment requirements to ensure atomicity.

Code written to use processor-specific fences can readily be changed to use C++0x fences, provided the operations used to check synchronization (rather than those used to access the synchronized data) are atomic. Existing code may well rely on the atomicity of plain loads and stores on a given CPU, but conversion to C++0x will require using atomic operations for those checks in order to provide the ordering guarantees.

这篇关于栅栏在C ++ 0x,保证只是在原子或内存一般的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆