现代 x86 成本模型 [英] Modern x86 cost model

查看:19
本文介绍了现代 x86 成本模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个带有 x86 后端的 JIT 编译器,并且正在学习 x86 汇编器和机器代码.大约 20 年前,我使用 ARM 汇编程序,对这些架构之间的成本模型差异感到惊讶.

I'm writing a JIT compiler with an x86 backend and learning x86 assembler and machine code as I go. I used ARM assembler about 20 years ago and am surprised by the difference in cost models between these architectures.

特别是,内存访问和分支在 ARM 上很昂贵,但在 x86 上等效的堆栈操作和跳转很便宜.我相信现代 x86 CPU 比 ARM 内核做的动态优化要多得多,我发现很难预测它们的效果.

Specifically, memory accesses and branches are expensive on ARM but the equivalent stack operations and jumps are cheap on x86. I believe modern x86 CPUs do far more dynamic optimizations than ARM cores do and I find it difficult to anticipate their effects.

在编写 x86 汇编程序时要记住的好的成本模型是什么?哪些指令组合便宜,哪些指令组合昂贵?

What is a good cost model to bear in mind when writing x86 assembler? Which combinations of instructions are cheap and which are expensive?

例如,如果我的编译器总是生成用于加载整数或跳转到偏移量的长格式,即使整数很小或偏移量接近,我的编译器会更简单,但这会影响性能吗?

For example, my compiler would be simpler if it always generated the long form for loading integers or jumping to offsets even if the integers were small or the offsets close but would this impact performance?

我还没有做过任何浮点运算,但我想尽快开始.普通代码和浮点代码之间的交互有什么不明显的吗?

I haven't done any floating point yet but I'd like to get on to it soon. Is there anything not obvious about the interaction between normal and float code?

我知道有很多关于 x86 优化的参考资料(例如 Michael Abrash),但我有一种预感,比几年前的任何东西都不适用于现代 x86 CPU,因为它们最近发生了很大变化.我说得对吗?

I know there are lots of references (e.g. Michael Abrash) on x86 optimization but I have a hunch than anything more than a few years old will not apply to modern x86 CPUs because they have changed so much lately. Am I correct?

推荐答案

最好的参考是Intel 优化手册,其中提供了有关所有最新 Intel 内核的架构危害和指令延迟的相当详细的信息,以及大量优化示例.

The best reference is the Intel Optimization Manual, which provides fairly detailed information on architectural hazards and instruction latencies for all recent Intel cores, as well as a good number of optimization examples.

另一个很好的参考是Agner Fog 的优化资源,其优点是也涵盖了 AMD 内核.

Another excellent reference is Agner Fog's optimization resources, which have the virtue of also covering AMD cores.

请注意,特定成本模型本质上是特定于微架构的.不存在具有任何实际有效性的x86 成本模型".在指令层面,Atom 的性能特点与 i7 有着天壤之别.

Note that specific cost models are, by nature, micro-architecture specific. There's no such thing as an "x86 cost model" that has any kind of real validity. At the instruction level, the performance characteristics of Atom are wildly different from i7.

我还要指出,x86 内核上的内存访问和分支实际上并不便宜"——只是乱序执行模型变得如此复杂,以至于它可以成功地将它们的成本隐藏在许多简单的场景.

I would also note that memory accesses and branches are not actually "cheap" on x86 cores -- it's just that the out-of-order execution model has become so sophisticated that it can successfully hide the cost of them in many simple scenarios.

这篇关于现代 x86 成本模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆