分支目标缓冲区检测到什么分支预测错误? [英] What branch misprediction does the Branch Target Buffer detect?

查看:152
本文介绍了分支目标缓冲区检测到什么分支预测错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究CPU管道的各个部分,这些部分可以检测到分支错误预测.我发现这些是:

I am currently looking at the various parts of the CPU pipeline which can detect branch mispredictions. I have found these are:

  1. 分支目标缓冲区(BPU清除)
  2. 分支地址计算器(BA CLEAR)
  3. 跳转执行单元(不确定此处的信号名称吗?)

我知道2和3可以检测到什么,但是我不明白在BTB中检测到了什么错误预测. BAC会检测到BTB错误地预测了非分支指令的分支,BTB无法检测到分支或错误地预测了x86 RET指令的目标地址.执行单元评估分支并确定分支是否正确.

I know what 2 and 3 detect, but I do not understand what misprediction is detected within the BTB. The BAC detects where the BTB has erroneously predicted a branch for a non-branch instruction, where the BTB has failed to detect a branch, or the BTB has mispredicted the target address for a x86 RET instruction. The execution unit evaluates the branch and determines if it was correct.

在分支目标缓冲区"中检测到哪种类型的错误预测?到底是什么被误认为是错误的预测?

我能找到的唯一线索是在《英特尔开发人员手册》第3卷(位于底部的两个BPU CLEAR事件计数器)中:

The only clue I could find was this inside Vol 3 of the Intel Developer Manuals (the two BPU CLEAR event counters at the bottom):

BPU在错误地假定分支是 没有采取.

BPU predicted a taken branch after incorrectly assuming that it was not taken.

这似乎意味着预测不是同步"完成的,而是异步"的,因此错误地假设之后"是什么?

This seems to imply the prediction is not done "synchronously", but rather "asynchronously", hence the "after incorrectly assuming"??

更新:

罗斯,这是来自原始英特尔专利的CPU分支电路(读取"的方式如何?):

Ross, this is the CPU branch circuitry, from the original Intel Patent (hows that for "reading"?):

我在任何地方都看不到分支预测单位"吗?读过这篇文章的人会认为"BPU"是将BTB电路,BTB缓存,BAC和RSB分组在一起的一种懒惰方式吗?

I don't see "Branch Prediction Unit" anywhere? Would it be reasonable that somebody having read this paper would assume that "BPU" is a lazy way of grouping the BTB Circuit, BTB Cache, BAC and RSB together??

所以我的问题仍然存在,哪个组件发出BPU CLEAR信号?

So my question still stands, which component raises the BPU CLEAR signal?

推荐答案

这是一个好问题!我认为,造成这种混乱的原因是英特尔的奇怪命名方案,这些命名方案经常使学术界的标准术语超载.我将尽力回答您的问题,并消除我在评论中看到的困惑.

This is a good question! I think the confusion that it's causing is due to Intel's strange naming schemes which often overload terms standard in academia. I will try to both answer your question and also clear up the confusion I see in the comments.

首先.我同意在标准计算机科学术语中,分支目标缓冲区与分支预测变量不是同义词.但是,在Intel术语中,分支目标缓冲区(BTB)[大写]是特定的东西,既包含预测变量,又包含分支目标缓冲区高速缓存(BTBC),这只是分支指令及其获取结果目标的表. BTBC被大多数人理解为分支目标缓冲区(小写).那么什么是分支地址计算器(BAC)?如果我们拥有BTB,为什么还要用它呢?

First of all. I agree that in standard computer science terminology a branch target buffer isn't synonymous with branch predictor. However in Intel terminology the Branch Target Buffer (BTB) [in capitals] is something specific and contains both a predictor and a Branch Target Buffer Cache (BTBC) which is just a table of branch instructions and their targets on a taken outcome. This BTBC is what most people understand as a branch target buffer [lower case]. So what is the Branch Address Calculator (BAC) and why do we need it if we have a BTB?

因此,您了解现代处理器被分成具有多个阶段的管道.无论是简单的流水线处理器还是乱序的超声明处理器,第一阶段通常是 fetch 然后是 decode .在 fetch 阶段,我们所拥有的只是程序计数器(PC)中包含的当前指令的地址.我们使用PC从内存中加载字节,并将其发送到 decode 阶段.在大多数情况下,为了增加后续指令,我们增加PC的数量,但在其他情况下,我们处理可以完全修改PC内容的控制流指令.

So, you understand that modern processors are split into pipelines with multiple stages. Whether this is a simple pipelined processor or an out of order supersclar processor, the first stages are typically fetch then decode. In the fetch stage all we have is the address of the current instruction contained in the program counter (PC). We use the PC to load bytes from memory and send them to the decode stage. In most cases we increment the PC in order to load the subsequent instruction(s) but in other cases we process a control flow instruction which can modify the contents of the PC completely.

BTB的目的是猜测PC中的地址是否指向分支指令,如果是,那么,PC中的下一个地址应该是什么?很好,我们可以对条件分支使用预测变量,对下一个地址使用BTBC.如果预测正确,那就太好了!如果预测错误,那该怎么办?如果BTB是我们唯一的单位,那么我们将不得不等到分支到达管道的 issue / execute 阶段.我们将不得不冲洗管道并重新开始.但是不是每一个情况都需要这么晚解决.这是分支地址计算器(BAC)出现的地方.

The purpose of the BTB is to guess if the address in the PC points to a branch instruction, and if so, what should the next address in the PC be? That's fine, we can use a predictor for conditional branches and the BTBC for the next address. If the prediction was right, that's great! If the prediction was wrong, what then? If the BTB is the only unit we have then we would have to wait until the branch reaches the issue/execute stage of the pipeline. We would have to flush the pipeline and start again. But not every situation needs to be resolved so late. This is where the Branch Address Calculator (BAC) comes in.

BTB用于管道的 fetch 阶段,但BAC驻留在 decode 阶段.一旦对我们提取的指令进行了解码,我们实际上就会获得更多有用的信息.我们知道的第一条新信息是:我实际上是 分支获取的指令吗?"在获取阶段我们一无所知,而BTB只能猜测,但是在解码阶段我们肯定知道.实际上,当指令不是分支时,BTB可能会预测分支.在这种情况下,BAC将暂停获取单元,修复BTB,然后重新正确启动获取.

The BTB is used in the fetch stage of the pipeline but the BAC resides in the decode stage. Once the instruction we fetched is decoded, we actually have a lot more information which can be useful. The first new piece of information we know is: "is the instruction I fetched actually a branch?" In the fetch stage we have no idea and the BTB can only guess, but in the decode stage we know it for sure. It is possible that the BTB predicts a branch when in fact the instruction is not a branch; in this case the BAC will halt the fetch unit, fix the BTB, and reinitiate fetching correctly.

那像unconditional relativecall这样的分支呢?这些可以在解码阶段进行验证. BAC将检查BTB,查看BTBC中是否有条目,并将预测变量设置为始终预测已采取.对于conditional分支,BAC无法确认它们是否已被采用/未采用,但是在地址预测错误的情况下,它至少可以验证预测的地址并更正BTB.有时BTB根本不会识别/预测分支. BAC需要更正此问题,并向BTB提供有关此指令的新信息.由于BAC本身没有条件预测变量,因此它使用了一种简单的机制(采用了向后分支,不采用了向前分支).

What about branches like unconditional relative and call? These can be validated at the decode stage. The BAC will check the BTB, see if there are entries in the BTBC and set the predictor to always predict taken. For conditional branches, the BAC cannot confirm if they are taken/not-taken yet, but it can at least validate the predicted address and correct the BTB in the event of a bad address prediction. Sometimes the BTB won't identify/predict a branch at all. The BAC needs to correct this and give the BTB new information about this instruction. Since the BAC doesn't have a conditional predictor of its own, it uses a simple mechanism (backwards branches taken, forward branches not taken).

有些人需要确认我对这些硬件计数器的理解,但是我认为它们的含义如下:

Somebody will need to confirm my understanding of these hardware counters, but I believe they mean the following:

    fetch 中的BTB出现问题时,
  • BACLEAR.CLEAR会增加 作业,并且 decode 中的BAC可以解决该问题.
  • BPU_CLEARS.EARLY是 当 fetch 决定(错误地)加载下一个时增加 在BTB预测它实际上应该从中加载之前的指令 采取的路径.这是因为BTB需要多个周期,并且 fetch 使用该时间推测性地加载连续的指令块.这可能是由于Intel使用了两个BTB,一个快速,另一个较慢,但更准确.需要更多的周期才能获得更好的预测.
  • BACLEAR.CLEAR is incremented when the BTB in fetch does a bad job and the BAC in decode can fix it.
  • BPU_CLEARS.EARLY is incremented when fetch decides (incorrectly) to load the next instruction before the BTB predicts that it should actually load from the taken path instead. This is because the BTB requires multiple cycles and fetch uses that time to speculatively load a consecutive block of instructions. This can be due to Intel using two BTBs, one quick and the other slower but more accurate. It takes more cycles to get a better prediction.

这解释了为什么在BTB中检测到错误预测的惩罚为2/3个周期,而在BAC中检测到错误预测的惩罚为8个周期.

This explains why the penalty of a detecting a misprediction in the BTB is 2/3 cycles whereas the detecting a misprediction in the BAC is 8 cycles.

这篇关于分支目标缓冲区检测到什么分支预测错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆