CPU何时可以忽略LOCK前缀并使用高速缓存一致性? [英] When can the CPU ignore the LOCK prefix and use cache coherency?

查看:455
本文介绍了CPU何时可以忽略LOCK前缀并使用高速缓存一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最初认为缓存一致性协议(如MESI)只能在单个内存加载/存储指令之间提供伪原子性。如果我正在执行获取,修改,写入指令的组合,单独的MESI将不能够在第一个指令到最后一个指令之间实施原子性。

I originally thought cache coherency protocols such as MESI can provide pseudo-atomicity but only across individual memory-load/store instructions. If I was performing a fetch, modify, write combination of instructions, MESI-alone wouldn't be able to enforce atomicity across the first instruction to the last.

英特尔参考手册第3a卷第8节说:

However, section 8 of the Intel reference manual Vol 3a says:


8.1.4 LOCK操作对内部处理器缓存的影响

8.1.4 Effects of a LOCK Operation on Internal Processor Caches

对于P6和更新的处理器系列,如果在LOCK操作期间锁定的内存区域
在处理器中缓存,
正在执行LOCK操作作为回写存储器并且
完全包含在高速缓存线中时,处理器可以不在总线上断言
LOCK#信号。相反,它将在内部修改内存位置
,并允许它的高速缓存一致性机制,以确保
操作是原子性地执行
。这个操作被称为cache
locking。高速缓存一致性机制自动防止两个或
更多的处理器从
缓存同一个内存区域同时修改该区域中的数据。

For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it’s cache coherency mechanism to ensure that the operation is carried out atomically. This operation is called "cache locking." The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area.

http://www.intel.com/content/dam/www/public/us /en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf

这似乎矛盾了我的理解,暗示LOCK指令不需要使用作为缓存一致性可以使用?

This seems to contradict my understanding by implying the LOCK instruction doesn't need to be used as cache coherency can be used?

推荐答案

锁定作为概念之间的差异和实际总线#lock信号 - 后者是实现第一个的手段之一。高速缓存锁定是另一种简单得多,效率更高的方法。

There's a difference between locking as a concept, and the actual bus #lock signal - the latter is one of the means of implementing the first. Cache locking is another one that is much simpler and more efficient.

MESI协议保证,如果一行由某个核心(修改或不修改)一个人有它。在这种情况下,您可以通过在缓存中添加简单标志来阻止外部监听,直到操作完成为止,以原子方式执行多个操作。这将具有与锁概念所指示的相同的效果,因为没有人可以改变或甚至观察中间值。

MESI protocol guarantees that if a line is held exclusively by a certain core (either modified or not), no one else has it. In this case you can perform multiple operations atomically by adding simple flag in the cache that blocks external snoops until the operations are done. This would have the same effect as the lock concept dictates since no one else may change or even observe the intermediate values.

在更复杂的情况下,单个高速缓存(例如,它可以在几个高速缓存之间共享,或者访问可以在两个高速缓存线之间分割,并且在高速缓存中只有一个) - 方案列表通常是实现特定的,并且可能未被CPU制造商公开)在这种情况下,你可能不得不诉诸重型大炮,如公共汽车锁,这通常保证没有人可以在共享母线上做任何。显然这对性能有很大的影响,所以这可能只有当你没有别的选择。在大多数情况下,简单的高速缓存级锁应该足够了。请注意,像Intel TSX这样的新方案似乎工作方式类似,当您在缓存中工作时提供优化。

On more complicated cases, the line is not held by a single cache (for e.g. it may be shared between several ones, or the access may be split between two cache lines and only one is in your cache - the list of scenarios is usually implementation specific and probably not disclosed by the CPU manufacturer) - in such cases you may have to resort to "heavier" cannons like the bus lock, which usually guarantees no one can do anything on the shared bus. Obviously this has a huge impact on performance so this is probably only used when you have no other choice. In most cases a simple cache-level lock should be enough. Note that new schemes like Intel TSX seem to work in a similar manner, offering optimizations when you're working from within the cache.

顺便说一下,单个指令的原子性也是错误的 - 如果你引用单个存储器操作(加载或存储)是正确的,因为指令可能包括多个指令( inc [addr] for eg将不是原子的,没有锁)。另一个限制也出现在您的报价中是访问需要包含在缓存行中 - 拆分线不保证原子性,即使在单个加载或存储(因为它们通常实现为2个内存操作,以后合并)。

By the way - your assumption about pseudo-atomicity for individual instruction is also wrong - it would be correct if you referred to a single memory operation (load or store), since an instruction may include multiple ones (inc [addr] for e.g. would not be atomic without a lock). Another restriction which also appears in your quote is that the access needs to be contained in a cache line - split lines don't guarantee atomicity even within a single load or store (since they're usually implemented as 2 memory operations that are later merged).

这篇关于CPU何时可以忽略LOCK前缀并使用高速缓存一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆