为什么不仅仅预测两个分支? [英] Why not just predict both branches?

查看:93
本文介绍了为什么不仅仅预测两个分支?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CPU使用分支预测来加快代码的速度,但前提是实际使用了第一个分支。

CPU's use branch prediction to speed up code, but only if the first branch is actually taken.

为什么不简单地同时使用两个分支?就是说,假设两个分支都将被命中,缓存双方,并在必要时采用适当的分支。缓存不需要无效。虽然这要求编译器在动手之前加载两个分支(更多的内存,适当的布局等),但我认为适当的优化可以简化这两个分支,以便可以从单个预测变量获得接近最佳的结果。也就是说,一个人需要更多的内存来加载两个分支(这对N个分支而言是指数级的),大多数时候,一个分支应该能够在完成执行分支之前足够快地用新代码重新缓存失败的分支

Why not simply take both branches? That is, assume both branches will be hit, cache both sides, and the take the proper one when necessary. The cache does not need to be invalidated. While this requires the compiler to load both branches before hand(more memory, proper layout, etc), I imagine that proper optimization could streamline both so that one can get near optimal results from a single predictor. That is, one would require more memory for loading both branches(which is exponential for N branches), the majority of the time one should be able to "recache" the failed branch with new code quickly enough before it has finished executing the branch taken.

如果(x)Bl else Br;

if (x) Bl else Br;

而不是假设采用Bl,则假设两个Bl和Br被采用(某种类型的并行处理或特殊交织),并且在实际确定分支之后,一个分支便无效,然后可以释放缓存以供使用(可能需要某种特殊技术来填充和使用

Instead of assuming Bl is taken, assume that both Bl and Br are taken(some type of parallel processing or special interleaving) and after the branch is actually determined, one branch is then invalid and the cache could then be freed for use(maybe some type of special technique would be required to fill and use it properly).

实际上,不需要任何预测电路,并且用于此目的的所有设计都可以用来处理两个分支。

In fact, no prediction circuitry is required and all the design used for that could be, instead, used to handle both branches.

有什么想法可行吗?

推荐答案

从两者中获取指令的历史观点路径



第一个类似的建议(对我的kn在1968年的专利中进行了讨论。我了解您只是在询问如何从两个分支机构获取指令,但请耐心一点。在该专利中,提出了三种广泛的策略,其中之一是遵循两条路径(直通路径和分支路径)。也就是说,不仅要从两条路径中获取指令,还要执行两条路径。解决了条件分支指令后,其中一条路径将被丢弃。在引入专利时只是提到了一个主意,但是专利本身是关于另一项发明的。

A Historical Perspective on Fetching Instructions from both Paths

The first similar proposal (to my knowledge) was discussed in this 1968 patent. I understand that you are only asking about fetching instructions from both branches, but bear with me a little. In that patent, three broad strategies were laid out, one of them is following both paths (the fall-through path and the branch path). That is, not just fetching instructions from both paths, but also executing both paths. When the conditional branch instruction is resolved, one of the paths is discarded. It was only mentioned as an idea in the introduction of the patent, but the patent itself was about another invention.

后来在1977年,IBM发行了商用处理器,称为 IBM 3033 处理器。据我所知,这是第一个完全实现您所建议的处理器。我很惊讶地看到Wikipedia页面没有提到处理器从两条路径中获取指令。描述IBM 3033的论文标题为 IBM 3033:内部外观。不幸的是,我找不到论文。但是纸张 wiki / IBM_3090 rel = noreferrer> IBM 3090 确实提到了这一事实。因此,您提出的建议确实有道理,并且在大约五年前的真正处理器中实现。

Later in 1977, a commercial processor was released from IBM, called the IBM 3033 processor. That is the first processor (to my knowledge) to implement exactly what you are proposing. I'm surprised to see that the Wikipedia page did not mention that the processor fetched instructions from both paths. The paper that describes the IBM 3033 is titled "The IBM 3033: An inside look". Unfortunately, I'm not able to find the paper. But the paper on the IBM 3090 does mention that fact. So what you're proposing did make sense and was implemented in real processors about half a decade ago.

一项专利于1981年提交,并于1984年获得了有关具有两个专利的处理器的专利。可以同时从两个存储器中提取存储器和指令。我从专利摘要中引用:

A patent was filed in 1981 and granted in 1984 about processor with two memories and instructions can be fetched from both memories simultaneously. I quote from the abstract of the patent:


具有两个单端口微程序
存储器的双读取微序列器,其中两个可以同时预取二进制条件分支的跳转地址
微指令,每个存储器中有一个。汇编微程序是为了使每个分支的顺序地址和跳转地址具有相反的奇数/偶数极性。因此,在一个存储器
中甚至在另一个存储器中都具有奇数地址的情况下,两个可能路径
的第一条指令总是可以同时被预取。当将条件分支
微指令加载到执行寄存器中时,其跳转
地址或与之对应的值将传输到地址
寄存器中,以用于适当的微程序存储器。执行寄存器中的
微指令的地址递增,并且
传送到另一个微程序存储器的地址寄存器。
从而减少了预取延迟。另外,当没有提供有效的条件
跳转地址时,该微程序存储器可能会在该微周期内透明地覆盖

A dual fetch microsequencer having two single-ported microprogram memories wherein both the sequential and jump address microinstructions of a binary conditional branch can be simultaneously prefetched, one from each memory. The microprogram is assembled so that the sequential and jump addresses of each branch have opposite odd/even polarities. Accordingly, with all odd addresses in one memory and even in the other, the first instruction of both possible paths can always be prefetched simultaneously. When a conditional branch microinstruction is loaded into the execution register, its jump address or a value corresponding to it is transferred to the address register for the appropriate microprogram memory. The address of the microinstruction in the execution register is incremented and transferred to the address register of the other microprogram memory. Prefetch delays are thereby reduced. Also, when a valid conditional jump address is not provided, that microprogram memory may be transparently overlayed during that microcycle.



从两条路径获取和执行指令的历史透视



在80年代和90年代,有很多关于提出和评估指令的技术的研究即使对于多个条件分支,也要从两个路径中获取并执行。这可能会导致另外两个开销,这两个路径都需要获取数据。 分支预测置信度的想法在这篇论文于1996年发表,用于通过对要获取和执行的路径进行更多选择来改进此类技术。 1998年发布的另一篇论文(线程化多路径执行)建议利用并发多线程(SMT)在条件分支之后运行多个路径的体系结构。 2002年发布的另一张纸张(双路径指令处理)建议获取,解码,并重命名但不执行两条路径中的指令。

A Historical Perspective on Fetching and Executing Instructions from both Paths

There is a lot of research published in the 80s and 90s about proposing and evaluating techniques by which instructions from both paths are not only fetched but also executed, even for multiple conditional branches. This will have the potential additional overhead of fetching data required by both paths. The idea of branch prediction confidence was proposed in this paper in 1996 and was used to improve such techniques by being more selective regarding which paths to fetch and execute. Another paper (Threaded Multiple Path Execution) published in 1998 proposes an architecture that exploits simultaneous multithreading (SMT) to run multiple paths following conditional branches. Another paper (Dual Path Instruction Processing) published in 2002 proposes to fetch, decode, and rename, but not execute, instructions from both paths.

从两条路径中获取一条指令通常,一个或多个高速缓存会降低高速缓存的有效容量,因为通常情况下,其中一个路径的执行频率将比另一个路径(在某些可能高度不规则的模式中)执行得更频繁。想象一下获取到L3缓存的情况,该缓存实际上总是在所有内核之间共享,并同时保存指令和数据。这可能会对L3缓存保存有用数据的能力产生负面影响。提取到小得多的L2缓存中甚至会导致性能大大降低,尤其是在包含L3的情况下。从跨所有内核的多个条件分支的两条路径中获取指令可能会导致高速缓存中保存的热数据被频繁驱逐并带回。因此,您提出的技术的极端变体会降低现代体系结构的整体性能。但是,攻击性较小的变体可能是有益的。

Fetching instructions from both paths into one or more of the caches reduces the effective capacity of the caches in general, because, typically, one of the paths will be executed much more frequently than the other (in some, potentially highly irregular, pattern). Imagine fetching into the L3 cache, which practically always shared between all the cores and holds both instructions and data. This can have negative impact on the ability of the L3 cache to hold useful data. Fetching into the much smaller L2 cache can even lead to a substantially worse performance, especially when the L3 is inclusive. Fetching instructions from both paths across multiple conditional branches for all the cores may cause hot data held in the caches to be frequently evicted and brought back. Therefore, extreme variants of the technique you are proposing would reduce the overall performance of modern architectures. However, less aggressive variants can be beneficial.

我不知道有任何真正的现代处理器在看到条件分支时会在两条路径上获取指令(也许有,但没有公开)。但是指令预取已经被广泛研究并且仍然是。这里需要解决的一个重要问题是:当预测路径变成错误路径时,高速缓存中已经存在来自其他路径的足够数量的指令的概率是多少?如果概率很高,则几乎没有动力从两条路径中获取指令。否则,确实有机会。根据旧的纸张来自Intel(错误路径指令预取),在经过测试的基准测试中,稍后在正确的路径执行过程中访问了50%以上在错误路径上访问的指令。这个问题的答案当然取决于要设计的处理器的目标域。

I'm not aware of any real modern processors that fetch instructions on both paths when they see a conditional branch (perhaps some do, but it's not publicly disclosed). But instruction prefetching has been extensively researched and still is. An important question here that needs to be addressed is: what is the probability that a sufficient number of instructions from the other path are already present in the cache when the predicted path turns out to be the wrong path? If the probability is high, then there would be little motivation to fetch instructions from both paths. Otherwise, there is indeed an opportunity. According to an old paper from Intel (Wrong-Path Instruction Prefetching), on the benchmarks tested, over 50% of instructions accessed on mispredicted paths were later accessed during correct path execution. The answer to this question certainly depends on the target domain of the processor being designed.

这篇关于为什么不仅仅预测两个分支?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆