在忙等待循环中需要内存屏障还是原子操作? [英] Is memory barrier or atomic operation required in a busy-wait loop?
问题描述
考虑下面的 spin_lock()
实现,最初来自这个答案 a>:
void spin_lock(volatile bool * lock){
for(;;){
//插入获取内存屏障和编译器屏障
if(!__ atomic_test_and_set(lock,__ATOMIC_ACQUIRE))
return;
while(* lock)//无障碍;可以吗?
cpu_relax();
}
}
我已经知道:
-
volatile
可防止编译器优化* lock
在的每次迭代中重读循环;
volatile
<插入内存和编译器障碍;
- 这样的实现实际上在GCC中适用于
x86
(例如在Linux内核中)和其他一些体系结构中;
- 至少有一个内存和编译器障碍需要
spin_lock()
通用体系结构的实现;本示例将它们插入到__ atomic_test_and_set()$ c
问题:
-
是否有足够的
volatile
或者是否存在任何需要内存或编译器屏障或原子操作的体系结构或编译器while
循环?
<1.1>根据C ++
标准?b / b>
$ b1.2实际上,对于已知的架构和编译器,特别是对于它支持的GCC和平台而言?
- 在GCC和Linux支持的所有体系结构上,这个实现是否安全? (在某些体系结构中,它至少低效,对吗?)
- 根据
而
至C ++ 11
及其内存模型?
有几个相关的问题,但我无法从它们构建明确而明确的答案: 原则上:是的,如果程序从一个内核移动到下一个内核,它可能看不到上一个内核发生的所有写入。 b $ b 在几乎所有现代体系结构中,都确保了缓存(如L1和L2缓存)硬件一致。无需刷新任何缓存以使内存对其他CPU可见。
- 这里的volatile是否足够大或者是否有任何体系结构或编译器在while循环中需要内存或编译器的屏障或原子操作?
会将volatile代码看到变化。是的,但不一定就像有内存障碍一样快。在某些情况下,会发生某种形式的同步,并且将从变量中读取新状态,但不能保证代码中其他地方发生了多少事情。
< blockquote>
1.1根据C ++标准?
这是内存模型和内存顺序,它定义代码需要处理的通用硬件。对于在执行线程之间传递的消息,需要发生线程间发生之前的关系。这需要...
- A与B同步
- A有一个std :: B之前的原子操作
- A与B(通过X)间接同步。
在X之前发生哪种线程间发生的排序,B li> - 一个线程读取发生在X和X之间的线程在B之前发生。
因为你没有执行任何操作在这些情况下,你的程序将会出现在某些当前硬件上,它可能会失败。在实践中,时间片的结束会导致内存变得连贯一致,或者在非自旋锁线程上的任何形式的障碍将确保缓存被刷新。
不能确定volatile读取的原因是current值。
$ b
1.2实际上,对于已知的架构和编译器,特别是对于它支持的GCC和平台而言?
由于代码与基因不一致从 C ++ 11
,这个代码很可能无法在试图遵守标准的C ++版本中执行。
从 cppreference:const volatile qualifiers
不稳定的访问停止了从之前到之后以及从之后到之前的工作的优化。
$ b
这使得volatile对象适合与信号处理程序通信,但不能与另一个执行线程通信。
因此,实现必须确保从内存位置读取指令,而不是任何本地副本。但它不必确保通过缓存刷新易失性写入以在所有CPU中产生连贯的视图。从这个意义上讲,写入一个volatile变量后写入另一个线程的时间没有时间限制。
//www.kernel.org/doc/html/v4.11/process/volatile-considered-harmful.htmlrel =nofollow noreferrer> kernel.org为什么volatile在内核中几乎总是错误的
在GCC和Linux支持的所有体系结构中,此实现是否安全? (在某些体系结构中,它至少是低效的,对吧?)
不能保证易失性消息离开设置的线程它。所以不是很安全。在Linux上它可能是安全的。
while循环是否安全,根据C ++ 11及其内存模型?
否 - 因为它不会创建任何线程间消息传递原语。
Consider the following spin_lock()
implementation, originally from this answer:
void spin_lock(volatile bool* lock) {
for (;;) {
// inserts an acquire memory barrier and a compiler barrier
if (!__atomic_test_and_set(lock, __ATOMIC_ACQUIRE))
return;
while (*lock) // no barriers; is it OK?
cpu_relax();
}
}
What I already know:
volatile
prevents compiler from optimizing out*lock
re-read on each iteration of thewhile
loop;volatile
inserts neither memory nor compiler barriers;- such an implementation actually works in GCC for
x86
(e.g. in Linux kernel) and some other architectures; - at least one memory and compiler barrier is required in
spin_lock()
implementation for a generic architecture; this example inserts them in__atomic_test_and_set()
.
Questions:
Is
volatile
enough here or are there any architectures or compilers where memory or compiler barrier or atomic operation is required in thewhile
loop?1.1 According to
C++
standards?1.2 In practice, for known architectures and compilers, specifically for GCC and platforms it supports?
- Is this implementation safe on all architectures supported by GCC and Linux? (It is at least inefficient on some architectures, right?)
- Is the
while
loop safe according toC++11
and its memory model?
There are several related questions, but I was unable to construct an explicit and unambiguous answer from them:
Q: Memory barrier in a single thread
In principle: Yes, if program execution moves from one core to the next, it might not see all writes that occurred on the previous core.
Q: memory barrier and cache flush
On pretty much all modern architectures, caches (like the L1 and L2 caches) are ensured coherent by hardware. There is no need to flush any cache to make memory visible to other CPUs.
Q: Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?
Q: Do you expect that future CPU generations are not cache coherent?
- Is volatile enough here or are there any architectures or compilers where memory or compiler barrier or atomic operation is required in the while loop?
will the volatile code see the change. Yes, but not necessarily as quickly as if there was a memory barrier. At some point, some form of synchronization will occur, and the new state will be read from the variable, but there are no guarantees on how much has happened elsewhere in the code.
1.1 According to C++ standards?
From cppreference : memory_order
It is the memory model and memory order which defines the generalized hardware that the code needs to work on. For a message to pass between threads of execution, an inter-thread-happens-before relationship needs to occur. This requires either...
- A synchronizes-with B
- A has a std::atomic operation before B
- A indirectly synchronizes with B (through X).
- A is sequenced before X which inter-thread happens before B
- A interthread happens before X and X interthread happens before B.
As you are not performing any of those cases there will be forms of your program where on some current hardware, it may fail.
In practice, the end of a time-slice will cause the memory to become coherent, or any form of barrier on the non-spinlock thread will ensure that the caches are flushed.
Not sure on the causes of the volatile read getting the "current value".
1.2 In practice, for known architectures and compilers, specifically for GCC and platforms it supports?
As the code is not consistent with the generalized CPU, from C++11
then it is likely this code will fail to perform with versions of C++ which try to adhere to the standard.
From cppreference : const volatile qualifiers Volatile access stops optimizations from moving work from before it to after it, and from after it to before it.
"This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution"
So an implementation has to ensure that instructions are read from the memory location rather than any local copy. But it does not have to ensure that the volatile write is flushed through the caches to produce a coherent view across all the CPUs. In this sense, there is no time boundary on how long after a write into a volatile variable will become visible to another thread.
Also see kernel.org why volatile is nearly always wrong in kernel
Is this implementation safe on all architectures supported by GCC and Linux? (It is at least inefficient on some architectures, right?)
There is no guarantee the volatile message gets out of the thread which sets it. So not really safe. On linux it may be safe.
Is the while loop safe according to C++11 and its memory model?
No - as it doesn't create any of the inter-thread messaging primitives.
这篇关于在忙等待循环中需要内存屏障还是原子操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!