x86 操作码对齐参考和指南 [英] x86 opcode alignment references and guidelines

查看:72
本文介绍了x86 操作码对齐参考和指南的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 JIT 编译器中动态生成一些操作码,我正在寻找操作码对齐的指南.

I'm generating some opcodes dynamically in a JIT compiler and I'm looking for guidelines for opcode alignment.

1) 我读过一些评论,通过在调用后添加 nops 来简要地推荐"对齐

1) I've read comments that briefly "recommend" alignment by adding nops after calls

2) 我还阅读了有关使用 nop 优化并行序列的文章.

2) I've also read about using nop for optimizing sequences for parallelism.

3) 我读过操作对齐对缓存"性能有好处

3) I've read that alignment of ops is good for "cache" performance

通常这些评论不提供任何支持参考.阅读博客或评论说这样做这样那样是个好主意"是一回事,但实际编写一个编译器来实现特定的操作序列并在线实现大多数材料,尤其是博客,这是另一回事用于实际应用.所以我相信自己找出问题(反汇编等,看看现实世界的应用程序做了什么).这是我需要一些外部信息的一种情况.

Usually these comments don't give any supporting references. Its one thing to read a blog or a comment that says, "its a good idea to do such and such", but its another to actually write a compiler that implements specific op sequences and realize most material online, especially blogs, are not useful for practical application. So I'm a believer in finding things out myself (disassembly, etc. to see what real world apps do). This is one case where I need some outside info.

我注意到编译器通常会在之前的任何指令序列之后立即启动一个奇字节指令.因此,在大多数情况下,编译器不会特别注意.我在这里或那里看到nop",但通常似乎 nop 被谨慎使用,如果有的话.操作码对齐有多重要?您能否提供我可以实际用于实施的案例的参考资料?谢谢.

I notice compilers will usually start an odd byte instruction immediately after whatever previous instruction sequence there was. So the compiler is not taking any special care in most cases. I see "nop" here or there, but usually it seems nop is used sparingly, if at all. How critical is opcode alignment? Can you provide references for cases that I can actually use for implementation? Thanks.

推荐答案

我建议不要插入 nops,除了分支目标的对齐.在某些特定的 CPU 上,分支预测算法可能会惩罚控制传输到控制传输,因此 nop 可能能够充当标志并反转预测,否则它不太可能有帮助.

I would recommend against inserting nops except for the alignment of branch targets. On some specific CPUs, branch prediction algorithms may penalize control transfers to control transfers, and so a nop may be able to act as a flag and invert the prediction, but otherwise it is unlikely to help.

现代 CPU 会将您的 ISA 操作转换为微操作 无论如何.这可能会使经典的对齐技术变得不那么重要,因为微操作转码器可能会遗漏 nops 并更改秘密真机操作的大小和对齐方式.

Modern CPU's are going to translate your ISA ops into micro-ops anyway. This may make classical alignment techniques less important, as presumably the micro-operation transcoder will leave out nops and change both the size and alignment of the secret true machine ops.

但是,出于同样的原因,基于首要原则的优化应该不会或几乎没有伤害.

However, by the same token, optimizations based on first principles should do little or no harm.

理论上可以通过在缓存线边界处开始循环来更好地利用缓存.如果循环从缓存行的中间开始,那么缓存行的前半部分将不可避免地被加载并在循环期间保持加载,如果循环长于 1/,这将浪费缓存中的空间2 个缓存行.

The theory is that one makes better use of the cache by starting loops at cache line boundaries. If a loop were to start in the middle of a cache line, then the first half of the cache line would be unavoidably loaded and kept loaded during the loop, and this would be wasted space in the cache if the loop is longer than 1/2 of a cache line.

此外,对于分支目标,当目标对齐时,缓存行的初始加载加载指令流的最大前向窗口.

Also, for branch targets, the initial load of the cache line loads the largest forward window of instruction stream when the target is aligned.

关于用 nops 分隔不是分支目标的内联指令,在现代 CPU 上这样做的原因很少.(曾经有一段时间 RISC 机器有延迟槽 这通常会导致在控制传输后插入 nops.)解码指令流很容易流水线化,如果架构具有奇数字节长度的操作,您可以确保它们被合理解码.

Regarding separating in-line instructions that are not branch targets with nops, there are few reasons for doing this on modern CPU's. (There was a time when RISC machines had delay slots which often led to inserting nops after control transfers.) Decoding the instruction stream is easy to pipeline and if an architecture has odd-byte-length ops you can be assured that they are decoded reasonably.

这篇关于x86 操作码对齐参考和指南的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆