内存屏障和锁定前缀指令之间的区别 [英] difference between Memory Barriers and lock prefixed instruction

查看:178
本文介绍了内存屏障和锁定前缀指令之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在本文中内存障碍和JVM并发 !,我被告知volatile是由不同的实现内存屏障指令,同步和原子通过锁定前缀指令实现。但我在其他文章中得到了以下代码:

In this article Memory Barriers and JVM Concurrency!, i was told volatile is implemented by different memory barriers instructions,while synchronized and atomic are implemented by lock prefixed instruction. But i get bellow code in some other article:

java代码:

volatile Singleton instance = new Singleton();

汇编指令(x86):

0x01a3de1d:movb $ 0x0,0x1104800(%esi);

0x01a3de24:lock addl $ 0x0,(%esp);

那么哪一个是对的?内存屏障和锁定前缀指令有什么区别考虑到我的英语不好?

So which one is right?And what is the difference between Memory Barriers and lock prefixed instruction without considering my poor english?

推荐答案

简短回答



锁定说明是用于以原子方式执行复杂的内存指令。

内存屏障用于对(部分或全部)内存访问进行排序。

Short answer

Lock instructions are used to perform complex memory instructions atomically.
Memory barriers are used to order (partially or totally) memory accesses.

Java volatile 关键字保证
所有线程可以看到对volatile变量的更改他们写在程序中。 volatile的唯一点和唯一点是
访问volatile变量是有序的,所以如果你访问变量 X 然后
变量 Y ,两者都是易变的,在所有
处理器 的Y 之前看到对X的访问!

Java volatile keyword guarantees that changes to volatile variables are seen by all threads as they are written in the program. The whole and only point of volatile is that accesses to volatile variables are total ordered, so if you access variable X and then variable Y, both volatile, the access to X is seen before the one to Y by all processors!

这需要对内存访问进行排序,因此需要内存屏障。

IA32e上的内存屏障可以通过fences指令实现( mfence,lfence,
sfence
)或使用 lock 指令。但是后一个选项只是 lock
效果,而不是它的主要用途。

锁定的指令是序列化的,所以
总订单。这对于仅订购内存访问是低效的,但是在没有围栏的旧处理器中使用

This requires ordering the memory accesses so requires memory barriers.
A memory barrier on IA32e may be implemented with the "fences" instructions (mfence, lfence, sfence) or with a lock instruction. But this latter option is just a side effect of lock and not its primary use.
Locked instructions are serialized and so have a total order. This is inefficient to order memory accesses only but works and it is used in older processors that lack the "fences".

所以你看到的锁实际上是障碍(Linux内核也使用相同的指令)。

So the lock you see is actually a barrier (Linux kernel used the same instruction too).

通过复杂内存指令上面我的意思是读 - 修改 - 写指令(在英特尔命名中),这些
是内部由三个操作组成的指令:
从内存中获取值,更改它并将其存储回来。

By "Complex memory instructions" above I mean Read-Modify-Write instructions (in Intel naming), these are instructions that internally consists of three operation: take the value from memory, change it and store it back.

如果在指令期间总线未被锁定,则另一个处理器可以在 读取后更改值
从记忆中, 之前 它会被存储回来。

If during the instruction the bus is not held locked another processor can change the value after it has been read from memory but before it is stored back.

示例
x = 0

Example x = 0

 CPU 1              CPU 2

 loop:              loop:    
    inc [X]            inc [x]
    j loop             j loop

如果每个CPU执行自己的循环10次,什么值将存储在x?

你不能以一种确定的方式告诉你。伪指令 inc [X] 必须用三个微操作
实现

If each CPU executes its own loop 10 times, what value will be stored in x?
You cannot tell in a determinist way. The pseudo instruction inc [X] must be implemented with three micro-operations as

  CPU 1              CPU 2

 loop:              loop:    
    mov r, [X]         mov r, [X]
    inc r              inc r
    mov [x], r         mov [x], r
    j loop             j loop

这情况可能已经发生:

CPU1: mov r, [X]    X is 0, CPU1 r is 0, CPU2 r is 0
CPU1: inc r         X is 0, CPU1 r is 1, CPU2 r is 0
CPU2: mov r, [X]    X is 0, CPU1 r is 1, CPU2 r is 0
CPU1: mov [X], r    X is 1, CPU1 r is 1, CPU2 r is 0
CPU1: mov r, [X]    X is 1, CPU1 r is 1, CPU2 r is 0
CPU1: inc r         X is 1, CPU1 r is 2, CPU2 r is 0
CPU1: mov [X], r    X is 2, CPU1 r is 2, CPU2 r is 0 
CPU2: inc r         X is 2, CPU1 r is 2, CPU2 r is 1 
CPU2: mov [X], r    X is 1, CPU1 r is 2, CPU2 r is 1

注意X是$而不是3.

通过锁定 inc 指令,CPU在系统总线上声明锁定为 inc
开始,直到它退休。这会强制这样的模式(示例)

Note how X is 1 instead of 3.
By locking the inc instruction the CPU assert a lock on the system bus as inc starts and until it retires. This forces a pattern like this (example)

CPU1: mov r, [X]    X is 0, CPU1 r is 0, CPU2 r is 0, CPU2 cannot use bus
CPU1: inc r         X is 0, CPU1 r is 1, CPU2 r is 0, CPU2 cannot use bus
CPU1: mov [X], r    X is 1, CPU1 r is 1, CPU2 r is 0, CPU2 cannot use bus

CPU1: mov r, [X]    X is 1, CPU1 r is 1, CPU2 r is 0, CPU2 cannot use bus
CPU1: inc r         X is 1, CPU1 r is 2, CPU2 r is 0, CPU2 cannot use bus
CPU1: mov [X], r    X is 2, CPU1 r is 2, CPU2 r is 0, CPU2 cannot use bus

CPU2: mov r, [X]    X is 2, CPU1 r is 1, CPU2 r is 2, CPU1 cannot use bus
CPU2: inc r         X is 2, CPU1 r is 2, CPU2 r is 3, CPU1 cannot use bus
CPU2: mov [X], r    X is 3, CPU1 r is 2, CPU2 r is 3, CPU1 cannot use bus

内存屏障用于命令内存访问。

处理器执行指令输出
of顺序,这意味着即使您向CPU发送指令 A B C 它可以
执行 C A B

Memory barriers instead are used to order memory accesses.
Processor execute instruction out of order, this means that even if you send a CPU the instructions A B C it can execute C A B.

但是,处理器必须遵守依赖关系和指令
只有在不会改变程序行为时才会无序执行。

要记住的一个非常重要的方面是指令执行和指令
退出之间的区别,因为处理器保持其架构状态(程序可以看到的状态)仅与退役指令一致。
通常程序看到指令的结果只有在退役时,即使已经执行了!但是对于内存访问,问题略有不同,因为它们具有修改主内存的全局可见副作用,并且无法撤消!

However processors are required to respect dependencies and instructions are executed out of order only when this won't change the program behavior.
A very important aspect to remember is the distinction between instruction execution and instruction retirement because a processor keeps its architectural state (the state a program can see) consistent only for retired instructions. Normally programs see a result of an instruction only when it is retired even if it has been executed already! But with memory access the matter is a slightly different as they have the globally visible side effect of modifying the main memory and that cannot be undone!

从程序中可以看出在CPU上,该CPU的所有内存访问都按程序顺序进行,但是处理器不会花费
来保证 其他 处理器看到内存访问订购!他们看到执行顺序或最差的传播顺序,因为缓存层次结构和内存拓扑!
不同处理器观察到的内存访问顺序不同。

So as seen from a program on a CPU all memory access of that CPU happens in program order, however the processor makes no efforts to guarantee that other processors see the memory accesses in the same order! They see the execution order or worst the propagation order due cache hierarchy and memory topology! The order of memory accesses is different as observed by different processors.

因此CPU允许程序员控制内存访问的方式按顺序排列,屏障停止其他内存指令(在同一CPU上)执行,直到执行/退出/传播所有前一个
(这取决于架构的屏障类型)。

So the CPU allow the programmer to control how memory accesses are ordered with barriers, a barrier stop other memory instructions (on the same CPU) from being executed until all the previous one are executed/retired/propagated (this depending on the architecture an barrier type).

示例

  x = 0, y = 0

  CPU 1            CPU2
  mov [x], 1       loop:
  mov [y], 1         mov r, [y]
                     jrz loop   ;Jump if r is 0
                   mov s, [x]

不需要锁。但是没有障碍,程序之后CPU2 s 可能为0。

这是因为 mov [ y],1 写入CPU1可以在写入x之前重新排序并执行


从CPU 1的角度来看没有任何改变,但对于CPU 2,订单有改变了!

There is no need of locks. However without barriers it is possible that CPU2 s is 0 after the program.
This is due the fact that the mov [y], 1 write of CPU1 can be reordered and executed before the write to x!
From CPU 1 perspective nothing changed, but for CPU 2 the order has changed!

有障碍

  x = 0, y = 0

  CPU 1            CPU2
  mov [x], 1       loop:
  sync               mov r, [y]
  mov [y], 1         jrz loop   ;Jump if r is 0
                   mov s, [x]

使用 sync 作为内存屏障伪指令。现在写入 y 不能是
重新排序,必须等待写入 x 才能看到CPU2。

Using sync as a memory barrier pseudo instruction. Now the write to y cannot be reordered and must wait for the write to x to be visible to CPU2.

事情比我的这张简单图片要精细得多,不同的处理器
有不同的障碍和内存排序。不同的体系结构具有不同的缓存/内存拓扑,需要特殊处理。
抽象这并不容易,Java有一个简单的内存模型,它使生成的代码更加复杂,C ++ 11有一个更精细的内存模型,可以让你更好地探索内存的效果
障碍。

Things are a little bit more elaborated than this simple picture of mine, different processors have different kind of barriers and memory ordering. Different architectures have different cache/memory topology which require special handling. Abstracting this is not easy, Java has a simple memory model which makes generated code more complex, C++11 have a more elaborate memory model that lets you explore better the effects of memory barriers.

在阅读发生之前之类的抽象符号之前,在Google上搜索常见架构的
内存排序问题很有用(IA32e,IA64,ARM,SPARC,Power,Alpha)所以
你可以看到真正的问题是什么以及如何解决。

Before reading abstract notation like happens-before it is useful to search on Google for memory ordering problem for common architecture (IA32e, IA64, ARM, SPARC, Power, Alpha) so that you can see what the real problem is and how can be solved.

IA32e体系结构是一个糟糕的体系结构,因为它的轻松内存顺序确实非常强大,并且大多数问题都不会在这个体系结构上发生。如果您有多处理器手机,则可以在ARM上进行测试。如果您喜欢一个极端的例子,请使用Alpha架构,即使依赖访问也会被重新排序!

And the IA32e architecture is a bad architecture to test on as its relaxed memory order is indeed quite strong and most of the problems cannot happen on this architecture. If you have a multiprocessor phone you can test on ARM. If you like an extreme example take the Alpha architecture where even dependent accesses are reordered!

这篇关于内存屏障和锁定前缀指令之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆