为什么有条件的移动不容易受到分支预测失败的影响? [英] Why is a conditional move not vulnerable for Branch Prediction Failure?

查看:23
本文介绍了为什么有条件的移动不容易受到分支预测失败的影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读了这篇文章(StackOverflow 上的答案)(在优化部分)后,我想知道为什么条件移动是不容易受到分支预测失败的影响.我在 上找到了一篇关于条件移动的文章(PDF 由 AMD).同样在那里,他们声称 cond 的性能优势.移动.但这是为什么呢?我没有看到.在评估该 ASM 指令的那一刻,尚不知道前一条 CMP 指令的结果.

After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction Failure. I found on an article on cond moves here (PDF by AMD). Also there, they claim the performance advantage of cond. moves. But why is this? I don't see it. At the moment that that ASM-instruction is evaluated, the result of the preceding CMP instruction is not known yet.

推荐答案

错误预测的分支代价高昂

如果一切顺利的话,现代处理器通常每个周期执行 1 到 3 条指令(如果它没有停止等待这些指令从先前指令或内存到达的数据依赖性).

Mis-predicted branches are expensive

A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory).

上面的语句对于紧密循环非常有效,但这不应该让您忽视一个额外的依赖项,它可以阻止一条指令在其循环到来时被执行:对于要执行的指令,处理器必须在 15 到 20 个周期之前开始取和解码.

The statement above holds surprisingly well for tight loops, but this shouldn't blind you to one additional dependency that can prevent an instruction to be executed when its cycle comes: for an instruction to be executed, the processor must have started to fetch and decode it 15-20 cycles before.

处理器遇到分支时应该怎么做?获取和解码两个目标不会扩展(如果遵循更多分支,则必须并行获取指数数量的路径).所以处理器只是推测性地获取和解码两个分支之一.

What should the processor do when it encounters a branch? Fetching and decoding both targets does not scale (if more branches follow, an exponential number of paths would have to be fetched in parallel). So the processor only fetches and decodes one of the two branches, speculatively.

这就是错误预测分支代价高昂的原因:它们花费 15-20 个周期,而这些周期由于高效的指令流水线而通常是不可见的.

This is why mis-predicted branches are expensive: they cost the 15-20 cycles that are usually invisible because of an efficient instruction pipeline.

条件移动不需要预测,所以它永远不会有这个惩罚.它与普通指令一样具有数据依赖性.事实上,条件移动比普通指令具有更多的数据依赖性,因为数据依赖性包括条件真"和条件假"两种情况.在有条件地将 r1 移动到 r2 的指令之后,r2 的内容似乎取决于 r2 的先前值code> 和 r1.一个预测良好的条件分支允许处理器推断出更准确的依赖关系.但是数据依赖通常需要一到两个周期才能到达,如果它们需要时间到达的话.

Conditional move does not require prediction, so it can never have this penalty. It has data dependencies, same as ordinary instructions. In fact, a conditional move has more data dependencies than ordinary instructions, because the data dependencies include both "condition true" and "condition false" cases. After an instruction that conditionally moves r1 to r2, the contents of r2 seem to depend on both the previous value of r2 and on r1. A well-predicted conditional branch allows the processor to infer more accurate dependencies. But data dependencies typically take one-two cycles to arrive, if they need time to arrive at all.

请注意,从内存到寄存器的有条件移动有时是一个危险的赌注:如果条件是从内存读取的值未分配给寄存器,那么您就没有等待内存.但是指令集中提供的条件移动指令通常是寄存器到寄存器,防止了程序员的这种错误.

Note that a conditional move from memory to register would sometimes be a dangerous bet: if the condition is such that the value read from memory is not assigned to the register, you have waited on memory for nothing. But the conditional move instructions offered in instruction sets are typically register to register, preventing this mistake on the part of the programmer.

这篇关于为什么有条件的移动不容易受到分支预测失败的影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆