是 mov rax,0x12345678;jmp rax 仍然会杀死分支预测? [英] is mov rax,0x12345678; jmp rax still kills branch prediction?

查看:32
本文介绍了是 mov rax,0x12345678;jmp rax 仍然会杀死分支预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法找到上述两种情况的特定信息,尽管听取了您的专家意见.

I'm having trouble finding information specific to the two cases described above, And though of hearing your expert opinion.

第一件事是:我知道间接 jmps 会损害分支预测,即使间接的结果是常数,它仍然需要预测维护缓冲区和东西,所有这些都与绝对 jmp 相比.

The first thing is: I know indirect jmps hurts branch prediction, and that even when the result of the indirection is constant, it still requires the prediction maintenance buffer and stuff, all in compare to absolute jmp.

我的问题是,如果有人知道:

My question is, if anyone knows if:

mov rax, 1234567812345678h;
jmp rax;

处理器的分支预测器仍然认为是间接的,或者在这种情况下它会做数学运算吗..我这样做是因为 x64 没有直接的jmp absolute 64".指导,只是间接的.:/(如何执行呼叫具有 64 位绝对地址的指令? 建议这样做,如果您不能将跳转放置得足够接近目标并使用 jmp rel32.)

Still considered indirect by the processor's branch predictor, or does it do the math in this case.. I'm doing so because x64 don't have a direct "jmp absolute 64" instruction, only indirect. :/ (How to execute a call instruction with a 64-bit absolute address? suggests this, if you can't instead put the jump close enough to the target and use jmp rel32.)

其次,在这个程度上,jmp 0x1234 和 call 0x1234 之间有什么真正的区别(在处理器优化方面(指令缓存、预取器及其提示、分支预测))?(vc2012速度优化"产生调用,min_size opt"产生jmp,混合优化"产生jmp for x64,call for x86)

Secondly, to that extent, is there any real difference between jmp 0x1234 and call 0x1234 (in terms of processor optimization (instruction cache, prefetcher and it's hints, branch prediction)) ? (vc2012 "speed optimization" yields call, "min_size opt" yields jmp, "mixed optimization" yields jmp for x64, call for x86)

推荐答案

英特尔的分支目标(和分支)预测既非常复杂,又是一个严格保密的商业秘密.不一定只有一种算法,也就是说,您可以预期预测机制因 CPU 而异;这取决于英特尔想要为给定处理器解决问题的晶体管数量.当然,除了英特尔之外,还有其他 x86 和 x64 处理器制造商.

Intel's branch target (and branch) prediction is both very sophisticated and a closely held trade secret. There isn't necessarily one single algorithm, that is, you can expect that the prediction mechanisms vary across CPUs; this depending on the number of transistors intel wants to throw at the problem for a given processor. And, of course, there are other manufacturers of x86 and x64 processors besides intel.

历史分支目标预测机制——使用过去运行的相同指令来预测后续执行的目标——几乎肯定会为这个分支预测正确的目标,因为只有一个.所以,如果这个代码序列被重新执行(例如在一个循环中)并且它在指令缓存中停留一段时间,它可能会得到很好的处理.(但是,在某些处理器上,如果其他地方的另一个分支发生哈希冲突,分支目标预测机制可能会被类似于缓存行冲突的影响所抵消.)

The historical branch target prediction mechanism -- which uses past runs of same instruction to predict target for subsequent executions -- will almost certainly predict the right target for this branch because there is only one. So, if this code sequence is re-executed (e.g. in a loop) and it stays in the instruction cache for a while it will likely be handled very well. (However, on some processors, the branch target prediction mechanism could by neutralized by similar effect to cache line collision if another branch elsewhere happens cause a hash collision.)

一个更大的问题可能是,如果这样的序列在新加载到缓存中的代码中大量出现,它的处理效果如何,这涉及处理器的非基于历史的目标预测功能.这种(非历史)分支目标预测可以很容易地确定给定此代码序列的分支位置,尽管这完全取决于制造商是否认为它值得任何给定处理器的芯片上的空间.做出此类决定的因素包括功耗、权衡其他性能改进(即可能更好地利用相同芯片面积)以及此类和各种其他代码序列的预期频率.

A bigger question probably is how well it is handled if such sequence liberally occurs in code newly loaded into the cache, which goes to a processor's non-history-based target prediction capabilities. Such (non-historical) branch target prediction could easily determine the branch location given this code sequence, though it depends entirely on whether the manufacturer deems it worthy of the real-estate on the die for any given processor. Factors to make such decision include power consumption, tradeoffs other performance improvements (i.e. possible better uses of the same die area), and frequency expected of such and various other code sequences.

这篇关于是 mov rax,0x12345678;jmp rax 仍然会杀死分支预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆