x86 上哪个写屏障更好:lock+addl 或 xchgl? [英] Which is a better write barrier on x86: lock+addl or xchgl?

查看:33
本文介绍了x86 上哪个写屏障更好:lock+addl 或 xchgl?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Linux 内核使用lock;addl $0,0(%%esp) 作为写屏障,而 RE2 库使用 xchgl (%0),%0 作为写屏障.有什么区别,哪个更好?

The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What's the difference and which is better?

x86 是否也需要读屏障指令?RE2 将其读取屏障功能定义为 x86 上的无操作,而 Linux 将其定义为 lfence 或无操作,具体取决于 SSE2 是否可用.什么时候需要lfence?

Does x86 also require read barrier instructions? RE2 defines its read barrier function as a no-op on x86 while Linux defines it as either lfence or no-op depending on whether SSE2 is available. When is lfence required?

推荐答案

The "lock;addl $0,0(%%esp)"如果我们在 (%%esp) 地址测试锁定变量的 0 状态,速度会更快.因为我们给锁变量加了0值,如果地址(%%esp)处的变量锁值为0,则零标志设置为1.

The "lock; addl $0,0(%%esp)" is faster in case that we testing the 0 state of lock variable at (%%esp) address. Because we add 0 value to lock variable and the zero flag is set to 1 if the lock value of variable at address (%%esp) is 0.

围栏来自英特尔数据表:

执行序列化操作所有从内存加载的指令在 LFENCE 之前发布操作说明.这个连载操作保证每个负载在程序之前的指令LFENCE 指令的顺序是在任何加载之前全局可见LFENCE 之后的指令指令是全局可见的.

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible.

(编者注:mfencelocked 操作是用于顺序一致性的唯一有用的栅栏(在存储之后).lfence 不会阻止 StoreLoad 重新排序由存储缓冲区.)

(Editor's note: mfence or a locked operation is the only useful fence (after a store) for sequential consistency. lfence does not block StoreLoad reordering by the store buffer.)

例如:像'mov'这样的内存写入指令是原子的(它们不需要锁定前缀),如果它们正确对齐.但是这条指令通常在CPU缓存中执行,此时对于所有其他线程不会全局可见,因为必须先执行内存栅栏,以使该线程等待直到之前的存储对其他线程可见.

For instance: memory write instruction like 'mov' are atomic (they don't need lock prefix) if they are properly aligned. But this instruction is normally executed in CPU cache and will not be globally visible at this moment for all other threads, because memory fence must be performed first to make this thread wait until previous stores are visible to other threads.

所以这两条指令的主要区别在于 xchgl 指令不会对条件标志产生任何影响.当然,我们可以使用lock cmpxchg指令测试锁变量状态,但这仍然比使用lock add $0指令复杂.

So the main difference between these two instructions is that xchgl instruction will not have any effect on the conditional flags. Certainly we can test the lock variable state with lock cmpxchg instruction but this is still more complex than with lock add $0 instruction.

这篇关于x86 上哪个写屏障更好:lock+addl 或 xchgl?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆