为什么在x86上锁定LOCK? [英] Why is LOCK a full barrier on x86?

查看:120
本文介绍了为什么在x86上锁定LOCK?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么 LOCK 前缀会在x86上造成完全障碍? (并因此耗尽存储缓冲区并具有顺序一致性)



对于 LOCK / read-modify-write操作,则不需要完全的障碍,并且对缓存行的独占访问似乎就足够了。是设计选择还是其他限制?

解决方案

很久以前,在Intel 80486之前,Intel处理器没有没有片上缓存或写缓冲区。因此,根据设计,所有写入将立即按顺序全局可见,而您不必从任何地方转移存储。通过完全锁定整个地址空间的总线来执行锁定的事务。



在486和Pentium处理器中,已在芯片上添加了写缓冲区,并且某些型号具有片上缓存。首先考虑没有片上缓存的型号。所有写操作都暂时保存在片上写缓冲区中,直到可用时将它们写到总线上或发生序列化事件为止。请记住,原子RMW事务用于获取对软件结构或硬件资源的排他性访问。因此,如果处理器执行了锁定的交易,则不应发生该处理器认为自己已被授予资源所有权的情况,但是另一个处理器也同样会以某种方式最终获得所有权。如果锁定的事务的写部分被缓冲在写缓冲区中,然后放弃了总线锁,则没有什么可以阻止其他代理同时获取对资源的访问权限。本质上,必须使所有其他代理都可以看到该写部件,而要做到这一点的方法是不对其进行缓冲。但是x86内存模型要求所有写入都必须按顺序全局可见(这些处理器上没有弱顺序)。因此,为了使锁定事务的写入部分在全局范围内可见,所有缓冲的写入操作也必须以相同顺序在全局范围内可见。



某些486模型和所有Pentium处理器具有片上缓存。但是在这些处理器上,不支持缓存锁。这就是为什么锁定的事务无法在这些处理器上进行缓存的原因,因为保证原子性的唯一方法是绕过缓存并锁定总线。获取总线锁定后,根据目标存储区的对齐方式和大小执行一次或多次写操作。释放总线锁定之前,仍然必须先耗尽写缓冲区。



Pentium Pro进行了一些重大更改,包括顺序较弱的写操作,写合并缓冲区和缓存锁定。所谓的写缓冲区是在更现代的微体系结构上通常称为存储缓冲区的内容。锁定的事务利用这些处理器上的缓存锁定,但是只有将锁定的存储从存储缓冲区提交到缓存后才能释放高速缓存锁定,这使得该存储在全局范围内可见,这必然要求所有在先的存储都在全局范围内可见。这些事件必须按此顺序发生。就是说,我认为锁定的交易不必序列化弱顺序的写入,但是英特尔已经决定采用这种方式。也许是因为英特尔想要一种方便的指令,以便在没有专用存储栅栏的情况下耗尽PPro上的WC缓冲区。

Why does the LOCK prefix cause a full barrier on x86? (And thus it drains the store buffer and has sequential consistency)

For LOCK/read-modify-write operations, a full barrier shouldn't be required and exclusive access to the cache line seems to be sufficient. Is it a design choice or is there some other limitation?

解决方案

Long time ago, before the Intel 80486, Intel processors didn't have on-chip caches or write buffers. Therefore, by design, all writes become immediately globally visible in order and you didn't have to drain stores from anywhere. A locked transaction is executed by fully locking the bus for the entire address space.

In the 486 and Pentium processors, write buffers have been added on-chip and some models have on-chip caches as well. Consider first the models that don't have on-chip caches. All writes are temporarily held in on-chip write buffers until they are written on the bus when available or a serializing event occurs. Remember that atomic RMW transactions are used to acquire exclusive access to software structures or hardware resources. So if a processor performs a locked transaction, it shouldn't happen that the processor thinks that it got granted ownership of the resource but then another processor also somehow ends up obtaining ownership as well. If the write part of the locked transaction gets buffered in a write buffer and then the bus lock is relinquished, there is nothing that prevents other agents from also acquiring access to the resource at the same time. Essentially, the write part has to be made visible to all other agents and the way to do this is by not buffering it. But the x86 memory model requires that all writes become globally visible in order (there was no weak ordering on these processors). So in order to make the write part of a locked transaction globally observable, all buffered writes had also be made globally observable in the same order.

Some 486 models and all Pentium processors have on-chip caches. But on these processor, there was no support for cache locks. That's why locked transactions were not cacheable on these processors because the only way to guarantee atomicity was to bypass the cache and lock the bus. After acquiring the bus lock, one or more writes are performed depending on the alignment and size of the destination memory region. The write buffers still have to be drained before releasing the bus lock.

The Pentium Pro introduced some major changes including weakly-ordered writes, write-combining buffers, and cache locking. What was called "writes buffers" is what is usually referred to as store buffers on more modern microarchitectures. A locked transaction utilizes cache locking on these processors, but the cache lock cannot be released until committing the locked store from the store buffer to the cache, which makes the store globally observable, which necessarily requires making all earlier stores globally observable. These events have to happen in that order. That said, I don't think locked transactions have to serialize weakly-ordered writes, but Intel has decided to make them this way. Maybe because Intel wanted a convenient instruction that drains WC buffers on the PPro in the absence of a dedicated store fence.

这篇关于为什么在x86上锁定LOCK?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆