挥发性贵吗? [英] Is volatile expensive?

查看:39
本文介绍了挥发性贵吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读编译器编写者的 JSR-133 Cookbook 后,关于volatile 的实现,尤其是与原子指令的交互"部分我假设读取一个 volatile 变量而不更新它需要一个 LoadLoad 或一个 LoadStore 屏障.在页面下方,我看到 LoadLoad 和 LoadStore 在 X86 CPU 上实际上是无操作的.这是否意味着可以在 x86 上无需显式缓存失效即可完成 volatile 读取操作,并且与普通变量读取一样快(不考虑 volatile 的重新排序约束)?

After reading The JSR-133 Cookbook for Compiler Writers about the implementation of volatile, especially section "Interactions with Atomic Instructions" I assume that reading a volatile variable without updating it needs a LoadLoad or a LoadStore barrier. Further down the page I see that LoadLoad and LoadStore are effectively no-ops on X86 CPUs. Does this mean that volatile read operations can be done without a explicit cache invalidation on x86, and is as fast as a normal variable read (disregarding the reordering constraints of volatile)?

我相信我没有正确理解这一点.有人可以关心启发我吗?

I believe I don't understand this correctly. Could someone care to enlighten me?

我想知道多处理器环境是否存在差异.在单 CPU 系统上,CPU 可能会查看它自己的线程缓存,正如 John V. 所说,但在多 CPU 系统上,必须有一些配置选项给 CPU,这还不够,必须命中主内存,使 volatile 变慢在多 CPU 系统上,对吗?

I wonder if there are differences in multi-processor environments. On single CPU systems the CPU might look at it's own thread caches, as John V. states, but on multi CPU systems there must be some config option to the CPUs that this is not enough and main memory has to be hit, making volatile slower on multi cpu systems, right?

PS:在了解更多信息的过程中,我偶然发现了以下很棒的文章,由于其他人可能对这个问题感兴趣,我将在这里分享我的链接:

PS: On my way to learn more about this I stumbled about the following great articles, and since this question may be interesting to others, I'll share my links here:

推荐答案

在 Intel 上,无竞争的 volatile 读取非常便宜.如果我们考虑以下简单情况:

On Intel an un-contended volatile read is quite cheap. If we consider the following simple case:

public static long l;

public static void run() {        
    if (l == -1)
        System.exit(-1);

    if (l == -2)
        System.exit(-1);
}

使用 Java 7 打印汇编代码的能力,run 方法如下所示:

Using Java 7's ability to print assembly code the run method looks something like:

# {method} 'run2' '()V' in 'Test2'
#           [sp+0x10]  (sp of caller)
0xb396ce80: mov    %eax,-0x3000(%esp)
0xb396ce87: push   %ebp
0xb396ce88: sub    $0x8,%esp          ;*synchronization entry
                                    ; - Test2::run2@-1 (line 33)
0xb396ce8e: mov    $0xffffffff,%ecx
0xb396ce93: mov    $0xffffffff,%ebx
0xb396ce98: mov    $0x6fa2b2f0,%esi   ;   {oop('Test2')}
0xb396ce9d: mov    0x150(%esi),%ebp
0xb396cea3: mov    0x154(%esi),%edi   ;*getstatic l
                                    ; - Test2::run@0 (line 33)
0xb396cea9: cmp    %ecx,%ebp
0xb396ceab: jne    0xb396ceaf
0xb396cead: cmp    %ebx,%edi
0xb396ceaf: je     0xb396cece         ;*getstatic l
                                    ; - Test2::run@14 (line 37)
0xb396ceb1: mov    $0xfffffffe,%ecx
0xb396ceb6: mov    $0xffffffff,%ebx
0xb396cebb: cmp    %ecx,%ebp
0xb396cebd: jne    0xb396cec1
0xb396cebf: cmp    %ebx,%edi
0xb396cec1: je     0xb396ceeb         ;*return
                                    ; - Test2::run@28 (line 40)
0xb396cec3: add    $0x8,%esp
0xb396cec6: pop    %ebp
0xb396cec7: test   %eax,0xb7732000    ;   {poll_return}
;... lines removed

如果您查看对 getstatic 的 2 个引用,第一个涉及从内存加载,第二个会跳过加载,因为该值从已经加载到的寄存器中重用(long 是 64 位,在我的32 位笔记本电脑,它使用 2 个寄存器).

If you look at the 2 references to getstatic, the first involves a load from memory, the second skips the load as the value is reused from the register(s) it is already loaded into (long is 64 bit and on my 32 bit laptop it uses 2 registers).

如果我们将 l 变量设置为 volatile,则生成的程序集将不同.

If we make the l variable volatile the resulting assembly is different.

# {method} 'run2' '()V' in 'Test2'
#           [sp+0x10]  (sp of caller)
0xb3ab9340: mov    %eax,-0x3000(%esp)
0xb3ab9347: push   %ebp
0xb3ab9348: sub    $0x8,%esp          ;*synchronization entry
                                    ; - Test2::run2@-1 (line 32)
0xb3ab934e: mov    $0xffffffff,%ecx
0xb3ab9353: mov    $0xffffffff,%ebx
0xb3ab9358: mov    $0x150,%ebp
0xb3ab935d: movsd  0x6fb7b2f0(%ebp),%xmm0  ;   {oop('Test2')}
0xb3ab9365: movd   %xmm0,%eax
0xb3ab9369: psrlq  $0x20,%xmm0
0xb3ab936e: movd   %xmm0,%edx         ;*getstatic l
                                    ; - Test2::run@0 (line 32)
0xb3ab9372: cmp    %ecx,%eax
0xb3ab9374: jne    0xb3ab9378
0xb3ab9376: cmp    %ebx,%edx
0xb3ab9378: je     0xb3ab93ac
0xb3ab937a: mov    $0xfffffffe,%ecx
0xb3ab937f: mov    $0xffffffff,%ebx
0xb3ab9384: movsd  0x6fb7b2f0(%ebp),%xmm0  ;   {oop('Test2')}
0xb3ab938c: movd   %xmm0,%ebp
0xb3ab9390: psrlq  $0x20,%xmm0
0xb3ab9395: movd   %xmm0,%edi         ;*getstatic l
                                    ; - Test2::run@14 (line 36)
0xb3ab9399: cmp    %ecx,%ebp
0xb3ab939b: jne    0xb3ab939f
0xb3ab939d: cmp    %ebx,%edi
0xb3ab939f: je     0xb3ab93ba         ;*return
;... lines removed

在这种情况下,对变量 l 的两个 getstatic 引用都涉及从内存加载,即该值不能跨多个易失性读取保存在寄存器中.为了确保有一个原子读取值从主内存读取到 MMX 寄存器 movsd 0x6fb7b2f0(%ebp),%xmm0 使读取操作成为一条指令(从前面的例子我们看到在 32 位系统上,64 位值通常需要两次 32 位读取).

In this case both of the getstatic references to the variable l involves a load from memory, i.e. the value can not be kept in a register across multiple volatile reads. To ensure that there is an atomic read the value is read from main memory into an MMX register movsd 0x6fb7b2f0(%ebp),%xmm0 making the read operation a single instruction (from the previous example we saw that 64bit value would normally require two 32bit reads on a 32bit system).

因此,易失性读取的总体成本大致相当于内存负载,并且可以与 L1 缓存访问一样便宜.但是,如果另一个内核正在写入 volatile 变量,则缓存行将失效,需要主存储器或 L3 缓存访问.实际成本将在很大程度上取决于 CPU 架构.即使在 Intel 和 AMD 之间,缓存一致性协议也不同.

So the overall cost of a volatile read will roughly equivalent of a memory load and can be as cheap as a L1 cache access. However if another core is writing to the volatile variable, the cache-line will be invalidated requiring a main memory or perhaps an L3 cache access. The actual cost will depend heavily on the CPU architecture. Even between Intel and AMD the cache coherency protocols are different.

这篇关于挥发性贵吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆