内存屏障和atomic_t在linux上 [英] memory barrier and atomic_t on linux

查看:888
本文介绍了内存屏障和atomic_t在linux上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我正在读一些Linux内核空间代码,我看到这个

  uint64_t used; 
uint64_t blocked;

used = atomic64_read(& g_variable-> used); //#1
barrier(); //#2
blocked = atomic64_read(& g_variable-> blocked); //#3

此代码段的语义是什么?它确保#1在#3之前执行#2。
但是我是一个乱码,因为



#A 在64位平台上,atomic64_read宏扩展到

  used =(& g_variable-> used) - > counter //其中计数器是易失性的。 

在32位平台上,它已转换为使用锁 cmpxchg8b 。我认为这两个有相同的语义,对于64位版本,我认为这意味着:


  1. ,我们可以排除地址未对齐和字大小大于CPU本机字大小的情况。

  2. 无优化,强制CPU从内存位置读取。

atomic64_read没有保留读取顺序的语义。 http://drdobbs.com/high-performance-computing/212701484\">此



#B barrier 宏定义为

  / *优化屏障* / 
/ *volatile由于gcc错误* /
#define barrier()__asm__ __volatile __(:::memory)


b $ b

通过维基,可防止 gcc编译器重新排序阅读并写入。



我困惑的是如何禁用CPU的重新排序优化?

解决方案

32位x86处理器不提供简单的原子读操作用于64位类型。在处理正常寄存器的这种CPU上,64位类型上的唯一原子操作是 LOCK CMPXCHG8B ,这就是为什么在这里使用的原因。另一种方法是使用 MOVQ 和MMX / XMM寄存器,但这需要了解FPU状态/寄存器,并要求对该值的所有操作都使用MMX / XMM指令。



在64位x86_64处理器上,64位类型的对齐读取是原子操作,可以使用 MOV 指令,因此只需要一个简单的读取---使用 volatile 只是为了确保编译器实际读取,而不是缓存以前的值。



对于读取排序,内联汇编器引用确保编译器按正确的顺序发出指令,这是所有必需的on x86 / x86_64 CPU,前提是写入顺序正确。 LOCK 在x86上的写入具有总排序;如果线程A x = 1 ,则写入提供因果一致性 y = 2 ,如果线程B读取 y == 2 ,则随后读取 x 将在IA-64,PowerPC,SPARC和其他处理器上显示 x == 1



<更轻松的内存模型可能更多地 atomic64_read() barrier()


Recently, I am reading some Linux kernel space codes, I see this

uint64_t used;
uint64_t blocked;

used = atomic64_read(&g_variable->used);       //#1
barrier();                                     //#2
blocked = atomic64_read(&g_variable->blocked); //#3

What is the semantics of this code snippet? Does it make sure #1 executes before #3 by #2. But I am a litter bit confused, becasue

#A In 64 bit platform, atomic64_read macro is expanded to

used = (&g_variable->used)->counter           // where counter is volatile.

In 32 bits platform, it was converted to use lock cmpxchg8b. I assume these two have the same semantic, and for 64 bits version, I think it means:

  1. all-or-nothing, we can exclude case where address is unaligned and word size large than CPU's native word size.
  2. no optimization, force CPU read from memory location.

atomic64_read doesn't have semantic for preserve read ordering!!! see this

#B the barrier macro is defined as

/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")

From the wiki this just prevents gcc compiler from reordering read and write.

What i am confused is how does it disable reorder optimization for CPU? In addition, can i think barrier macro is full fence?

解决方案

32-bit x86 processors don't provide simple atomic read operations for 64-bit types. The only atomic operation on 64-bit types on such CPUs that deals with "normal" registers is LOCK CMPXCHG8B, which is why it is used here. The alternative is to use MOVQ and MMX/XMM registers, but that requires knowledge of the FPU state/registers, and requires that all operations on that value are done with the MMX/XMM instructions.

On 64-bit x86_64 processors, aligned reads of 64-bit types are atomic, and can be done with a MOV instruction, so only a plain read is required --- the use of volatile is just to ensure that the compiler actually does a read, and doesn't cache a previous value.

As for the read ordering, the inline assembler you quote ensures that the compiler emits the instructions in the right order, and this is all that is required on x86/x86_64 CPUs, provided the writes are correctly sequenced. LOCKed writes on x86 have a total ordering; plain MOV writes provide "causal consistency", so if thread A does x=1 then y=2, if thread B reads y==2 then a subsequent read of x will see x==1.

On IA-64, PowerPC, SPARC, and other processors with a more relaxed memory model there may well be more to atomic64_read() and barrier().

这篇关于内存屏障和atomic_t在linux上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆