分支目标缓冲区检测到什么分支错误预测? [英] What branch misprediction does the Branch Target Buffer detect?

查看:26
本文介绍了分支目标缓冲区检测到什么分支错误预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究可以检测分支预测错误的 CPU 管道的各个部分.我发现这些是:

I am currently looking at the various parts of the CPU pipeline which can detect branch mispredictions. I have found these are:

  1. 分支目标缓冲区 (BPU CLEAR)
  2. 分支机构地址计算器(BA CLEAR)
  3. 跳转执行单元(不确定这里的信号名称??)

我知道 2 和 3 检测到什么,但我不明白 BTB 中检测到什么错误预测.BAC 检测 BTB 在哪里错误预测了非分支指令的分支,BTB 未能检测到分支,或者 BTB 错误预测了 x86 RET 指令的目标地址.执行单元评估分支并确定它是否正确.

I know what 2 and 3 detect, but I do not understand what misprediction is detected within the BTB. The BAC detects where the BTB has erroneously predicted a branch for a non-branch instruction, where the BTB has failed to detect a branch, or the BTB has mispredicted the target address for a x86 RET instruction. The execution unit evaluates the branch and determines if it was correct.

在分支目标缓冲区检测到什么类型的错误预测?这里究竟检测到什么错误预测?

我能找到的唯一线索是英特尔开发人员手册第 3 卷(底部的两个 BPU CLEAR 事件计数器):

The only clue I could find was this inside Vol 3 of the Intel Developer Manuals (the two BPU CLEAR event counters at the bottom):

BPU 在错误地假设它是没拍.

BPU predicted a taken branch after incorrectly assuming that it was not taken.

这似乎意味着预测不是同步"完成的,而是异步"完成的,因此是错误假设之后"??

This seems to imply the prediction is not done "synchronously", but rather "asynchronously", hence the "after incorrectly assuming"??

更新:

Ross,这是 CPU 分支电路,来自原始英特尔专利(如何阅读"?):

Ross, this is the CPU branch circuitry, from the original Intel Patent (hows that for "reading"?):

我在任何地方都没有看到分支预测单元"?读过这篇论文的人会认为BPU"是一种将 BTB 电路、BTB 缓存、BAC 和 RSB 组合在一起的懒惰方式,这是否合理?

I don't see "Branch Prediction Unit" anywhere? Would it be reasonable that somebody having read this paper would assume that "BPU" is a lazy way of grouping the BTB Circuit, BTB Cache, BAC and RSB together??

所以我的问题仍然存在,哪个组件会引发 BPU CLEAR 信号?

So my question still stands, which component raises the BPU CLEAR signal?

推荐答案

这是个好问题!我认为它造成的混乱是由于英特尔奇怪的命名方案,它经常使学术界的术语标准过载.我会尽量回答你的问题,同时也会澄清我在评论中看到的困惑.

This is a good question! I think the confusion that it's causing is due to Intel's strange naming schemes which often overload terms standard in academia. I will try to both answer your question and also clear up the confusion I see in the comments.

首先.我同意在标准计算机科学术语中,分支目标缓冲区不是分支预测器的同义词.然而,在英特尔的术语中,分支目标缓冲区 (BTB) [大写] 是特定的,包含预测器和分支目标缓冲区缓存 (BTBC),后者只是一个分支指令表及其对所采取结果的目标.这个 BTBC 是大多数人理解的分支目标缓冲区 [小写].那么什么是分支地址计算器 (BAC)?如果我们有 BTB,为什么还需要它?

First of all. I agree that in standard computer science terminology a branch target buffer isn't synonymous with branch predictor. However in Intel terminology the Branch Target Buffer (BTB) [in capitals] is something specific and contains both a predictor and a Branch Target Buffer Cache (BTBC) which is just a table of branch instructions and their targets on a taken outcome. This BTBC is what most people understand as a branch target buffer [lower case]. So what is the Branch Address Calculator (BAC) and why do we need it if we have a BTB?

因此,您了解现代处理器被分成具有多个阶段的管道.无论这是一个简单的流水线处理器还是一个乱序的 supersclar 处理器,第一阶段通常是获取然后解码.在 fetch 阶段,我们所拥有的只是包含在程序计数器 (PC) 中的当前指令的地址.我们使用 PC 从内存中加载字节并将它们发送到 decode 阶段.在大多数情况下,我们增加 PC 以加载后续指令,但在其他情况下,我们处理可以完全修改 PC 内容的控制流指令.

So, you understand that modern processors are split into pipelines with multiple stages. Whether this is a simple pipelined processor or an out of order supersclar processor, the first stages are typically fetch then decode. In the fetch stage all we have is the address of the current instruction contained in the program counter (PC). We use the PC to load bytes from memory and send them to the decode stage. In most cases we increment the PC in order to load the subsequent instruction(s) but in other cases we process a control flow instruction which can modify the contents of the PC completely.

BTB 的目的是猜测 PC 中的地址是否指向分支指令,如果是,那么 PC 中的下一个地址应该是什么?没关系,我们可以对条件分支使用预测器,对下一个地址使用 BTBC.如果预测是对的,那就太好了!如果预测错了,那怎么办?如果 BTB 是我们拥有的唯一单元,那么我们将不得不等到分支到达管道的 issue/execute 阶段.我们将不得不冲洗管道并重新开始.但并非所有情况都需要这么晚才解决.这就是分支地址计算器 (BAC) 的用武之地.

The purpose of the BTB is to guess if the address in the PC points to a branch instruction, and if so, what should the next address in the PC be? That's fine, we can use a predictor for conditional branches and the BTBC for the next address. If the prediction was right, that's great! If the prediction was wrong, what then? If the BTB is the only unit we have then we would have to wait until the branch reaches the issue/execute stage of the pipeline. We would have to flush the pipeline and start again. But not every situation needs to be resolved so late. This is where the Branch Address Calculator (BAC) comes in.

BTB 用于管道的 fetch 阶段,但 BAC 驻留在 decode 阶段.一旦我们提取的指令被解码,我们实际上有更多有用的信息.我们知道的第一条新信息是:我获取的指令实际上是一个分支吗?"在获取阶段我们不知道,BTB 只能猜测,但在解码阶段我们知道肯定.BTB 可能预测到了一个分支,而实际上该指令不是一个分支;在这种情况下,BAC 将停止提取单元,修复 BTB,并正确重新启动提取.

The BTB is used in the fetch stage of the pipeline but the BAC resides in the decode stage. Once the instruction we fetched is decoded, we actually have a lot more information which can be useful. The first new piece of information we know is: "is the instruction I fetched actually a branch?" In the fetch stage we have no idea and the BTB can only guess, but in the decode stage we know it for sure. It is possible that the BTB predicts a branch when in fact the instruction is not a branch; in this case the BAC will halt the fetch unit, fix the BTB, and reinitiate fetching correctly.

unconditional relativecall 这样的分支怎么样?这些可以在解码阶段进行验证.BAC 将检查 BTB,查看 BTBC 中是否有条目并将预测器设置为始终预测采用.对于conditional分支,BAC无法确认它们是否被采用/尚未被采用,但它至少可以验证预测的地址并在地址预测错误的情况下纠正BTB.有时 BTB 根本不会识别/预测分支.BAC 需要对此进行更正,并向 BTB 提供有关此指令的新信息.由于 BAC 没有自己的条件预测器,它使用一种简单的机制(采用向后分支,不采用正向分支).

What about branches like unconditional relative and call? These can be validated at the decode stage. The BAC will check the BTB, see if there are entries in the BTBC and set the predictor to always predict taken. For conditional branches, the BAC cannot confirm if they are taken/not-taken yet, but it can at least validate the predicted address and correct the BTB in the event of a bad address prediction. Sometimes the BTB won't identify/predict a branch at all. The BAC needs to correct this and give the BTB new information about this instruction. Since the BAC doesn't have a conditional predictor of its own, it uses a simple mechanism (backwards branches taken, forward branches not taken).

需要有人确认我对这些硬件计数器的理解,但我相信它们的含义如下:

Somebody will need to confirm my understanding of these hardware counters, but I believe they mean the following:

  • BACLEAR.CLEARfetch 中的 BTB 执行错误时递增作业和 decode 中的 BAC 可以修复它.
  • BPU_CLEARS.EARLY 是当 fetch 决定(错误地)加载下一个BTB 预测它实际上应该从取而代之的路径.这是因为 BTB 需要多个周期,并且 fetch 使用该时间来推测性地加载连续的指令块.这可能是因为英特尔使用了两个 BTB,一个快,另一个慢但更准确.获得更好的预测需要更多的周期.
  • BACLEAR.CLEAR is incremented when the BTB in fetch does a bad job and the BAC in decode can fix it.
  • BPU_CLEARS.EARLY is incremented when fetch decides (incorrectly) to load the next instruction before the BTB predicts that it should actually load from the taken path instead. This is because the BTB requires multiple cycles and fetch uses that time to speculatively load a consecutive block of instructions. This can be due to Intel using two BTBs, one quick and the other slower but more accurate. It takes more cycles to get a better prediction.

这解释了为什么在 BTB 中检测到错误预测的惩罚是 2/3 个周期,而在 BAC 中检测到一个错误预测的惩罚是 8 个周期.

This explains why the penalty of a detecting a misprediction in the BTB is 2/3 cycles whereas the detecting a misprediction in the BAC is 8 cycles.

这篇关于分支目标缓冲区检测到什么分支错误预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆