在CPU仿真中使用切换案例时如何处理分支预测 [英] How to deal with branch prediction when using a switch case in CPU emulation

查看:116
本文介绍了在CPU仿真中使用切换案例时如何处理分支预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在这里阅读了此问题为什么 $ b并发现答案绝对令人着迷,并且在处理基于数据的分支时完全改变了我对编程的看法。

I recently read the question here Why is it faster to process a sorted array than an unsorted array? and found the answer to be absolutely fascinating and it has completely changed my outlook on programming when dealing with branches that are based on Data.

我目前有一个相当基本但是用C语言编写的功能完整的解释型Intel 8080 Emulator,该操作的核心是一个256长的开关柜表,用于处理每个操作码。我最初的想法是,这显然将是最快的工作方法,因为操作码编码在整个8080指令集中不一致,并且解码会增加很多复杂性,不一致和一次性情况。充满预处理器宏的开关案例表非常整洁且易于维护。

I currently have a fairly basic, but fully functioning interpreted Intel 8080 Emulator written in C, the heart of the operation is a 256 long switch-case table for handling each opcode. My initial thought was this would obviously be the fastest method of working as opcode encoding isn't consistent throughout the 8080 instruction set and decoding would add a lot of complexity, inconsistency and one-off cases. A switch-case table full of pre-processor macros is a very neat and easy to maintain.

不幸的是,在阅读了上述文章后,我发现绝对没有我计算机中的分支预测器可以预测开关案例的跳跃的方式。因此,每次在切换情况下进行导航时,都必须彻底清除管道,这会导致几个周期的延迟,否则本来应该是一个非常快的程序(我的代码中甚至没有那么多乘法)。

Unfortunately, after reading the aforementioned post it occurred to me that there's absolutely no way the branch predictor in my computer can predict the jumping for the switch case. Thus every time the switch-case is navigated the pipeline would have to be completely wiped, resulting in a several cycle delay in what should otherwise be an incredibly quick program (There's not even so much as multiplication in my code).

我确定你们中的大多数人都在思考哦,这里的解决方案很简单,请转到动态重新编译。是的,这似乎可以减少大部分开关盒的使用,并显着提高速度。不幸的是,我的主要兴趣是仿真较旧的8位和16位时代的控制台(这里的intel 8080只是一个示例,因为它是我最简单的仿真代码),因此保持周期和定时与精确的指令一样重要,因为视频和声音必须根据这些确切的时间进行处理。

I'm sure most of you are thinking "Oh, the solution here is simple, move to dynamic recompilation". Yes, this does seem like it would cut out the majority of the switch-case and increase speed considerably. Unfortunately my primary interest is emulating older 8-bit and 16-bit era consoles (the intel 8080 here is only an example as it's my simplest piece of emulated code) where cycle and timing keeping to the exact instruction is important as the Video and Sound must be processed based on these exact timings.

处理这种级别的准确性时,即使对于较旧的控制台,性能也成为一个问题(例如,看看bSnes)。相反,在处理具有长管道的处理器时,是否有任何追索权或仅仅是事实问题?

When dealing with this level of accuracy performance becomes an issue, even for older consoles (Look at bSnes for example). Is there any recourse or is this simply a matter-of-fact when dealing with processors with long pipelines?

推荐答案

switch 语句可能会转换为跳转表,表示他们可能执行一些 if (用于范围检查)和一次跳转。 if 应该不会引起分支预测问题,因为不太可能出现错误的操作码。跳转对管道不太友好,但是最后,对于整个 switch 语句来说,它只是一个跳转。.

On the contrary, switch statements are likely to be converted to jump tables, which means they perform possibly a few ifs (for range checking), and a single jump. The ifs shouldn't cause a problem with branch prediction because it is unlikely you will have a bad op-code. The jump is not so friendly with the pipeline, but in the end, it's only one for the whole switch statement..

我不认为您可以将很长的 switch 操作码语句转换为任何其他形式,以提高性能。如果您的编译器足够聪明,可以将其转换为跳转表,则当然是这样。如果不是,则可以手动执行。

I don't believe you can convert a long switch statement of op-codes into any other form that would result in better performance. This is of course, if your compiler is smart enough to convert it to a jump table. If not, you can do so manually.

如果有疑问,请实施其他方法并衡量性能。

If in doubt, implement other methods and measure performance.

首先,请确保您不要混淆分支预测分支 target 预测

First of all, make sure you don't confuse branch prediction and branch target prediction.

分支预测仅对分支语句起作用。它决定分支条件是失败还是成功。他们与跳转语句无关。

Branch prediction solely works on branch statements. It decides whether a branch condition would fail or succeed. They have nothing to do with the jump statement.

分支目标预测则试图猜测跳转将在何处结束。

Branch target prediction on the other hand tries to guess where the jump will end up in.

因此,您的语句分支预测器无法预测跳跃应为分支 target 预测器无法预测跳跃。

So, your statement "there's no way the branch predictor can predict the jump" should be "there's no way the branch target predictor can predict the jump".

在您的特定情况下,我认为您实际上无法避免这种情况。如果您只有很少的一组运算,那么您可能会想出一个涵盖所有运算的公式,例如在逻辑电路中进行的运算。但是,即使指令集与CPU一样大,即使它具有风险,该指令集的计算成本也要比单次跳转的代价高得多。

In your particular case, I don't think you can actually avoid this. If you had a very small set of operations, perhaps you could come up with a formula that covers all your operations, like those made in logic circuits. However, with an instruction set as big as a CPU's, even if it were RISK, the cost of that computation is much higher than the penalty of a single jump.

这篇关于在CPU仿真中使用切换案例时如何处理分支预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆