x86_64 - 自修改代码性能 [英] x86_64 - Self-modifying code performance

查看:52
本文介绍了x86_64 - 自修改代码性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读英特尔架构文档,第 3 卷,第 8.1.3 节

I am reading the Intel architecture documentation, vol3, section 8.1.3;

自修改代码的执行性能低于非自修改代码或普通代码.性能下降的程度取决于修改的频率和代码的具体特性.

Self-modifying code will execute at a lower level of performance than non-self-modifying or normal code. The degree of the performance deterioration will depend upon the frequency of modification and specific characteristics of the code.

所以,如果我遵守规则:

So, if I respect the rules:

(* 选项 1 *)将修改后的代码(作为数据)存储到代码段中;跳转到新代码或中间位置;执行新代码;

(* OPTION 1 *) Store modified code (as data) into code segment; Jump to new code or an intermediate location; Execute new code;

(* 选项 2 )将修改后的代码(作为数据)存储到代码段中;执行序列化指令;( 例如 CPUID 指令 *) 执行新代码;

(* OPTION 2 ) Store modified code (as data) into code segment; Execute a serializing instruction; ( For example, CPUID instruction *) Execute new code;

AND 修改代码每周一次,我应该只在下次修改此代码并即将执行时支付罚款.但在那之后,性能应该与未​​修改的代码相同(+ 跳转到该代码的成本).

AND modify the code once a week, I should only pay the penalty the next time this code is modified and about to be executed. But after that, the performance should be the same as non modified code (+ the cost of a jump to that code).

我的理解正确吗?

推荐答案

根本还没有缓存的代码与修改已经推测性进行中的指令的代码之间存在差异在调度程序中并在乱序核心中重新排序缓冲区).写入已经被 CPU 视为指令的内存会导致它回退到非常缓慢的操作.这就是自修改代码的通常含义.即使 JIT 编译不太难,也可以避免这种减速.在全部写入之前不要跳转到缓冲区.

There's a difference between code that's simply not yet cached, vs. code that modifies instructions that are already speculatively in-flight (fetched, maybe decoded, maybe even sitting in the scheduler and re-order buffer in the out-of-order core). Writes to memory that's already being looked at as instructions by the CPU cause it to fall back to very slow operation. This is what's usually meant by self-modifying code. Avoiding this slowdown even when JIT-compiling is not too hard. Just don't jump to your buffer until after it's all written.

每周修改一次意味着如果你做错了,你每周可能会受到 1 微秒的惩罚.确实,经常使用的数据不太可能从缓存中被逐出(这就是为什么多次读取某些内容更有可能使其粘住"),但自修改代码管道刷新应该只应用第一个时间,如果你遇到它.之后,正在执行的缓存行处于 prob 状态.如果第二次运行没有太多干预代码,则在 L1 I-cache(和 uop 缓存)中仍然很热.它在 L1 D-cache 中不再处于修改状态.

Modified once a week means you might have a one microsecond penalty once a week, if you do it wrong. It's true that frequently-used data is less likely to be evicted from the cache (that's why reading something multiple times is more likely to make it "stick"), but the self-modifying-code pipeline-flush should only apply the very first time, if you encounter it at all. After that, the cache lines being executed are in prob. still hot in L1 I-cache (and uop cache), if the 2nd run happens without much intervening code. It's not still in a modified state in L1 D-cache.

我忘记了 http://agner.org/optimize/ 是否谈到自修改代码和 JIT.即使没有,如果您在 ASM 中编写任何内容,也应该阅读 Agner 的指南.不过,主要的优化 asm"中的一些内容已经过时,并且与 Sandybridge 和后来的 Intel CPU 没有真正的相关性.由于 uop 缓存,对齐/解码问题不那么严重,而且 SnB 系列微架构的对齐问题可能有所不同.

I forget if http://agner.org/optimize/ talks about self-modifying code and JIT. Even if not, you should read Agner's guides if you're writing anything in ASM. Some of the stuff in the main "optimizing asm" is getting out of date and not really relevant for Sandybridge and later Intel CPUs, though. Alignment / decode issues are less of an issue thanks to the uop cache, and alignment issues can be different for SnB-family microarches.

这篇关于x86_64 - 自修改代码性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆