为什么对于32位系统上的64位变量__sync_add_and_fetch工作? [英] Why does __sync_add_and_fetch work for a 64 bit variable on a 32 bit system?

查看:285
本文介绍了为什么对于32位系统上的64位变量__sync_add_and_fetch工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下浓缩code:

Consider the following condensed code:

/* Compile: gcc -pthread -m32 -ansi x.c */
#include <stdio.h>
#include <inttypes.h>
#include <pthread.h>

static volatile uint64_t v = 0;

void *func (void *x) {
    __sync_add_and_fetch (&v, 1);
    return x;
}

int main (void) {
    pthread_t t;
    pthread_create (&t, NULL, func, NULL);
    pthread_join (t, NULL);
    printf ("v = %"PRIu64"\n", v);
    return 0;
}

我有,我想原子递增 uint64_t中变量,因为变量是在多线程程序的计数器。
为了实现原子我用GCC的原子建宏的。

I have a uint64_t variable that I want to increment atomically, because the variable is a counter in a multi-threaded program. To achieve the atomicity I use GCC's atomic builtins.

如果我编译一个AMD64系统(-m64)所产生的汇编code是容易理解的。
通过使用锁定addq ,处理器保证了增量是原子的。

If I compile for an amd64 system (-m64) the produced assembler code is easy to understand. By using a lock addq, the processor guarantees the increment to be atomic.

 400660:       f0 48 83 05 d7 09 20    lock addq $0x1,0x2009d7(%rip)

但相同的C code产生一个​​非常复杂的ASM code的IA32系统(-m32)上:

But the same C code produces a very complicated ASM code on an ia32 system (-m32):

804855a:       a1 28 a0 04 08          mov    0x804a028,%eax
804855f:       8b 15 2c a0 04 08       mov    0x804a02c,%edx
8048565:       89 c1                   mov    %eax,%ecx
8048567:       89 d3                   mov    %edx,%ebx
8048569:       83 c1 01                add    $0x1,%ecx
804856c:       83 d3 00                adc    $0x0,%ebx
804856f:       89 ce                   mov    %ecx,%esi
8048571:       89 d9                   mov    %ebx,%ecx
8048573:       89 f3                   mov    %esi,%ebx
8048575:       f0 0f c7 0d 28 a0 04    lock cmpxchg8b 0x804a028
804857c:       08 
804857d:       75 e6                   jne    8048565 <func+0x15>

下面是我不明白:


  • 锁定CMPXCHG8B 确实的保证,如果预期值仍驻留在目标地址更改的变量只写。比较和交换是保证原子发生。

  • 但是什么保证,在0x804855a和0x804855f变量的读数是原子?

  • lock cmpxchg8b does guarantee that the changed variable is only written if the expected value still resides in the target address. The compare-and-swap is guaranteed to happen atomically.
  • But what guarantees that the reading of the variable in 0x804855a and 0x804855f to be atomic?

也许,如果有一个脏读,但可能有人请勾勒出一个短期的证明有没有问题?不要紧

Probably it does not matter if there was a "dirty read", but could someone please outline a short proof that there is no problem?

另外:为什么产生code跳回0x8048565,而不是0x804855a?我肯定,这是唯一正确的,如果其他作家也一样,只是增加了变数。这是 __ sync_add_and_fetch 功能的牵连要求?

Further: Why does the generated code jump back to 0x8048565 and not 0x804855a? I am positive that this is only correct if other writers, too, only increment the variable. Is this an implicated requirement for the __sync_add_and_fetch function?

推荐答案

读取被保证是原子,由于它被正确对齐(和它适合于一个高速缓存行),也因为英特尔提出规范这种方式,请参阅英特尔架构手册第1卷,4.4.1:

The read is guaranteed to be atomic due to it being aligned correctly (and it fits on one cache line) and because Intel made the spec this way, see the Intel Architecture manual Vol 1, 4.4.1:

这是跨越4字节边界或一个字或双操作数
  该十字架8字节边界被认为是四字操作数
  未对齐并需要两个独立的内存总线周期进行访问。

A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.

卷3A 8.1.1:

奔腾处理器(和自更新的处理器)保证了
  以下附加的存储器操作将总是进行
  原子:

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

•读取或写入在64位对齐的四字
  边界

• Reading or writing a quadword aligned on a 64-bit boundary

•16位访问适合非高速缓存的内存位置
  一个32位数据总线内

• 16-bit accesses to uncached memory locations that fit within a 32-bit data bus

P6系列处理器(以及更高
  因为处理器)保证以下额外的内存
  操作将始终原子进行:

The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:

•未对齐的16-,32-,
  和64位访问以适合的高速缓存行内的高速缓存的存储器

• Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

因此​​,由对准,就可以在1个周期读出,它适合在一个高速缓存线使得读原子

Thus by being aligned, it can be read in 1 cycle, and it fits into one cache line making the read atomic.

在code跳回 0x8048565 ,因为指针已经被加载,则无需加载它们再次,为 CMPXCHG8B 将设置 EAX:EDX 来在目标中的值,如果它失败:

The code jumps back to 0x8048565 because the pointers have already be loaded, there is no need to load them again, as CMPXCHG8B will set EAX:EDX to the value in the destination if it fails:

CMPXCHG8B 为Intel ISA手动卷说明。 2A:

CMPXCHG8B Description for the Intel ISA manual Vol. 2A:

比较EDX:EAX与M64。如果相等,设置ZF和负载ECX:EBX为M64。
  否则清除ZF,并加载M64到EDX:EAX

Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX.

因此​​,code只需要增加新返回的值,然后再试一次。
如果我们这样的在C code变得更加容易:

Thus the code needs only to increment the newly returned value and try again. If we this of it in C code it becomes easier:

value = dest;
While(!CAS8B(&dest,value,value + 1))
{
    value = dest;
}

这篇关于为什么对于32位系统上的64位变量__sync_add_and_fetch工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆