在CPU仿真中使用切换案例时如何处理分支预测 [英] How to deal with branch prediction when using a switch case in CPU emulation

查看：116 发布时间：2020/6/7 18:37:45 c performance compiler-optimization emulation branch-prediction

本文介绍了在CPU仿真中使用切换案例时如何处理分支预测的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近在这里阅读了此问题为什么 $ b并发现答案绝对令人着迷，并且在处理基于数据的分支时完全改变了我对编程的看法。

I recently read the question here Why is it faster to process a sorted array than an unsorted array? and found the answer to be absolutely fascinating and it has completely changed my outlook on programming when dealing with branches that are based on Data.

我目前有一个相当基本但是用C语言编写的功能完整的解释型Intel 8080 Emulator，该操作的核心是一个256长的开关柜表，用于处理每个操作码。我最初的想法是，这显然将是最快的工作方法，因为操作码编码在整个8080指令集中不一致，并且解码会增加很多复杂性，不一致和一次性情况。充满预处理器宏的开关案例表非常整洁且易于维护。

I currently have a fairly basic, but fully functioning interpreted Intel 8080 Emulator written in C, the heart of the operation is a 256 long switch-case table for handling each opcode. My initial thought was this would obviously be the fastest method of working as opcode encoding isn't consistent throughout the 8080 instruction set and decoding would add a lot of complexity, inconsistency and one-off cases. A switch-case table full of pre-processor macros is a very neat and easy to maintain.

不幸的是，在阅读了上述文章后，我发现绝对没有我计算机中的分支预测器可以预测开关案例的跳跃的方式。因此，每次在切换情况下进行导航时，都必须彻底清除管道，这会导致几个周期的延迟，否则本来应该是一个非常快的程序（我的代码中甚至没有那么多乘法）。

Unfortunately, after reading the aforementioned post it occurred to me that there's absolutely no way the branch predictor in my computer can predict the jumping for the switch case. Thus every time the switch-case is navigated the pipeline would have to be completely wiped, resulting in a several cycle delay in what should otherwise be an incredibly quick program (There's not even so much as multiplication in my code).

我确定你们中的大多数人都在思考哦，这里的解决方案很简单，请转到动态重新编译。是的，这似乎可以减少大部分开关盒的使用，并显着提高速度。不幸的是，我的主要兴趣是仿真较旧的8位和16位时代的控制台（这里的intel 8080只是一个示例，因为它是我最简单的仿真代码），因此保持周期和定时与精确的指令一样重要，因为视频和声音必须根据这些确切的时间进行处理。

I'm sure most of you are thinking "Oh, the solution here is simple, move to dynamic recompilation". Yes, this does seem like it would cut out the majority of the switch-case and increase speed considerably. Unfortunately my primary interest is emulating older 8-bit and 16-bit era consoles (the intel 8080 here is only an example as it's my simplest piece of emulated code) where cycle and timing keeping to the exact instruction is important as the Video and Sound must be processed based on these exact timings.

处理这种级别的准确性时，即使对于较旧的控制台，性能也成为一个问题（例如，看看bSnes）。相反，在处理具有长管道的处理器时，是否有任何追索权或仅仅是事实问题？

When dealing with this level of accuracy performance becomes an issue, even for older consoles (Look at bSnes for example). Is there any recourse or is this simply a matter-of-fact when dealing with processors with long pipelines?

在CPU仿真中使用切换案例时如何处理分支预测 [英] How to deal with branch prediction when using a switch case in CPU emulation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在CPU仿真中使用切换案例时如何处理分支预测 [英] How to deal with branch prediction when using a switch case in CPU emulation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭