为什么一个条件移动不容易的分行prediction失败? [英] Why is a conditional move not vulnerable for Branch Prediction Failure?

查看:160
本文介绍了为什么一个条件移动不容易的分行prediction失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

后(StackOverflow上回答)(在优化部分)之后,我想知道为什么有条件的动作是不容易的分行prediction失败。我发现(由AMD PDF)在这里COND移动的文章。还有,他们声称COND的性能优势。移动。但是,这是为什么?我没有看到它。此刻该ASM所指令被评估,则preceding CMP指令的结果还没有公知的。

After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction Failure. I found on an article on cond moves here (PDF by AMD). Also there, they claim the performance advantage of cond. moves. But why is this? I don't see it. At the moment that that ASM-instruction is evaluated, the result of the preceding CMP instruction is not known yet.

感谢。

推荐答案

一个现代的处理器通常执行和1之间三条指令每个周期,如果一切顺利的话(如果它不熄火等待数据依赖这些说明从previous指令或从存储器到达)。

Mis-predicted branches are expensive

A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory).

上面的式子出奇地好了紧密的循环,但这不应该盲目你可以prevent的指令被执行时,其周期到来时一个额外的依赖性:
对于要执行的指令,处理器必须已经开始取和去code将其前15-20个循环。

The statement above holds surprisingly well for tight loops, but this shouldn't blind you to one additional dependency that can prevent an instruction to be executed when its cycle comes: for an instruction to be executed, the processor must have started to fetch and decode it 15-20 cycles before.

当它遇到一个分支应该怎样做处理器?取出和解码两个目标不能扩展(如果多个分支遵循的路径的指数数将必须在平行于可读取)。因此,处理器只需取出并去codeS两个分支之一,推测。

What should the processor do when it encounters a branch? Fetching and decoding both targets does not scale (if more branches follow, an exponential number of paths would have to be fetched in parallel). So the processor only fetches and decodes one of the two branches, speculatively.

这就是为什么错predicted分支是昂贵的:他们花费了15-20个周期是因为一个高效的指令流水线平时看不见

This is why mis-predicted branches are expensive: they cost the 15-20 cycles that are usually invisible because of an efficient instruction pipeline.

有条件的移动不需要prediction,所以它永远不能拥有这个点球。它有数据依赖,像普通的指令。事实上,条件移动拥有比普通指令的详细数据相关性,因为数据依赖包括条件为真和假的条件的情况下。有条件地移动的指令后 R1 R2 R2 似乎取决于这两个previous R2 和 R1 。 pdicted条件分支一个良好$ P $使处理器可以推断更精确的依赖关系。但数据依赖通常需要一两个周期的到来,如果他们需要时间来在所有到达。

Conditional move does not require prediction, so it can never have this penalty. It has data dependencies, same as ordinary instructions. In fact, a conditional move has more data dependencies than ordinary instructions, because the data dependencies include both "condition true" and "condition false" cases. After an instruction that conditionally moves r1 to r2, the contents of r2 seem to depend on both the previous value of r2 and on r1. A well-predicted conditional branch allows the processor to infer more accurate dependencies. But data dependencies typically take one-two cycles to arrive, if they need time to arrive at all.

请注意,从内存有条件的举动注册有时会是一个危险的赌注:如果条件是这样的,从内存中读出的值未分配到寄存器,你已经在内存白白等待。但是在指令集所提供的条件移动指令通常注册登记,$ P $的程序员一部分pventing这个错误。

Note that a conditional move from memory to register would sometimes be a dangerous bet: if the condition is such that the value read from memory is not assigned to the register, you have waited on memory for nothing. But the conditional move instructions offered in instruction sets are typically register to register, preventing this mistake on the part of the programmer.

这篇关于为什么一个条件移动不容易的分行prediction失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆