无序指令执行:保留提交顺序吗? [英] Out-of-order instruction execution: is commit order preserved?

查看:337
本文介绍了无序指令执行:保留提交顺序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一方面,维基百科记录了乱序执行的步骤:

On the one hand, Wikipedia writes about the steps of the out-of-order execution:

  1. 指令提取.
  2. 将指令分派到指令队列(也称为指令缓冲区或预留站).
  3. 指令在队列中等待,直到其输入操作数可用为止.然后允许该指令在 较早的说明.
  4. 将指令发布到适当的功能单元并由该功能单元执行.
  5. 结果排队.
  6. 仅在所有较旧的指令中将其结果写回寄存器文件后,该结果才被写回到寄存器文件.这称为毕业或退休阶段.
  1. Instruction fetch.
  2. Instruction dispatch to an instruction queue (also called instruction buffer or reservation stations).
  3. The instruction waits in the queue until its input operands are available. The instruction is then allowed to leave the queue before earlier, older instructions.
  4. The instruction is issued to the appropriate functional unit and executed by that unit.
  5. The results are queued.
  6. Only after all older instructions have their results written back to the register file, then this result is written back to the register file. This is called the graduation or retire stage.

类似的信息可以在计算机组织和设计"书中找到:

The similar information can be found in the "Computer Organization and Design" book:

使程序的行为就像在简单的顺序中运行一样 流水线,需要发出指令获取和解码单元 指令,以便可以跟踪依赖关系,并且 需要提交单元将结果写入寄存器和存储器中 程序获取顺序.这种保守模式称为有序 commit ...今天,所有动态调度的管道都使用按顺序提交.

To make programs behave as if they were running on a simple in-order pipeline, the instruction fetch and decode unit is required to issue instructions in order, which allows dependences to be tracked, and the commit unit is required to write results to registers and memory in program fetch order. This conservative mode is called in-order commit... Today, all dynamically scheduled pipelines use in-order commit.

据我所知,即使指令以乱序方式执行,其执行结果也会保留在重排序缓冲区中,然后以确定的顺序提交给内存/寄存器

So, as far as I understand, even if the instructions execution is done in the out-of-order manner, the results of their executions are preserved in the reorder buffer and then committed to the memory/registers in a deterministic order.

另一方面,有一个已知的事实,即现代CPU可以出于性能提升的目的对存储操作进行重新排序(例如,可以对两个相邻的独立加载指令进行重新排序). Wikipedia 此处.

On the other hand, there is a known fact that modern CPUs can reorder memory operations for the performance acceleration purposes (for example, two adjacent independent load instructions can be reordered). Wikipedia writes about it here.

能否请您说明一下这种差异?

Could you please shed some light on this discrepancy?

推荐答案

TL:DR:内存排序与顺序执行不一样.即使在有序的流水线CPU上也会发生这种情况.

按顺序提交使当前内核自己的代码按顺序运行本身. (并且允许精确的异常可以回滚到发生错误的指令,而在退出之前没有任何指令).无序执行的黄金法则是:不要破坏单线程代码.

In-order commit makes the current core's own code see itself as running in-order. (And allows precise exceptions that can roll-back to exactly the instruction that faulted, without any instructions after that having already retired). The golden rule of out-of-order execution is: don't break single-threaded code.

内存排序与 other 内核所见有关.还要注意,您引用的内容只是在谈论将结果提交到寄存器文件,而不是内存.

Memory ordering is all about what other cores see. Also notice that what you quoted is only talking about committing results to the register file, not to memory.

由于每个内核的专用L1缓存与系统中的所有其他数据缓存保持一致,因此内存排序是指令何时读取或写入缓存的问题.这与他们退休时是分开的.

Since each core's private L1 cache is coherent with all the other data caches in the system, memory ordering is a question of when instructions read or write cache. This is separate from when they retire.

当负载从缓存中读取数据时,它们变得全局可见.他们执行"时或多或少是肯定的,而且肯定在他们退休(又称提交)之前.

Loads become globally visible when they read their data from cache. This is more or less when they "execute", and definitely way before they retire (aka commit).

当存储的数据被提交到高速缓存时,存储将变得全局可见.这必须等待,直到它们被认为是非推测性的,即没有异常或中断将导致回滚,而该回滚必须撤消"存储.这样一来,商店就可以在其从混乱的核心中退出时提早提交给L1缓存.

Stores become globally visible when their data is committed to cache. This has to wait until they're known to be non-speculative, i.e. that no exceptions or interrupts will cause a roll-back that has to "undo" the store. So a store can commit to L1 cache as early as when it retires from the out-of-order core.

但是,即使有序CPU也会使用存储队列或存储缓冲区来隐藏L1高速缓存中未命中的存储的延迟.一旦知道它肯定会发生,乱序机制就不需要继续跟踪商店,因此商店insn/uop甚至可以在提交到L1缓存之前就退出.存储缓冲区将一直保留到该缓冲区,直到L1缓存准备好接受它为止.即它拥有缓存行( MESI缓存一致性协议的M状态)和内存-排序规则使商店现在可以在全球范围内可见.

But even in-order CPUs use a store queue or store buffer to hide the latency of stores that miss in L1 cache. The out-of-order machinery doesn't need to keep tracking a store once it's known that it will definitely happen, so a store insn/uop can retire even before it commits to L1 cache. The store buffer holds onto it until L1 cache is ready to accept it. i.e. it owns the cache line (M state of the MESI cache coherency protocol), and the memory-ordering rules allow the store to become globally visible now.

另请参阅我在写入时分配/读取时的答案缓存策略

据我了解,商店的数据在无序内核中执行"时会添加到商店队列中,这就是商店执行单元的工作.

As I understand it, a store's data is added to the store queue when it "executes" in the out-of-order core, and that's what a store execution unit does.

加载程序必须探查存储队列,以便它们可以看到最近存储的数据.

Loads have to probe the store queue so that they see recently-stored data.

对于x86之类的ISA,具有强顺序性,存储队列必须保留ISA的内存顺序语义.即商店无法与其他商店重新排序,并且商店之前无法全局可见. (不允许对LoadStore重新排序(StoreStore或LoadLoad也不允许),仅对StoreLoad重新排序).

For an ISA like x86, with strong ordering, the store queue has to preserve the memory-ordering semantics of the ISA. i.e. stores can't reorder with other stores, and stores can't become globally visible before earlier loads. (LoadStore reordering isn't allowed (nor is StoreStore or LoadLoad), only StoreLoad reordering).

David Kanter的有关如何以不同方式实现TSX(交易内存)的文章比Haswell的功能更深入地了解了内存顺序缓冲区,以及它与跟踪指令/uop重新排序的ReOrder缓冲区(ROB)是如何分开的.他首先介绍了当前的工作方式,然后介绍如何修改它以跟踪可以成组提交或中止的事务.

David Kanter's article on how TSX (transactional memory) could be implemented in different ways than what Haswell does provides some insight into the Memory Order Buffer, and how it's a separate structure from the ReOrder Buffer (ROB) that tracks instruction/uop reordering. He starts by describing how things currently work, before getting into how it could be modified to track a transaction that can commit or abort as a group.

这篇关于无序指令执行:保留提交顺序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆