为什么在64模式下默认操作数大小为32位? [英] Why is default operand size 32 bits in 64 mode?

查看:136
本文介绍了为什么在64模式下默认操作数大小为32位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读英特尔文档,第1卷。 1,有一章是 3.6.1
64位模式下的操作数大小和地址大小
。有三个前缀 REX.W ,操作数大小 66 和地址大小 67 前缀。并且提到了操作数默认为32位大小。并且只能使用 REX.W 指令前缀(在其他前缀之后)将其更改为64位长。



我不知道为什么,为什么不能将完整的64位空间用于 int 操作数?它与标志有关吗?还是为什么有这个限制? (因此,C unsigned int 是否使用 REX.W 前缀对int进行操作(如前所述) ,前缀仅对特定指令有效,而对整个段无效,该段应为默认值(大小,地址或操作数的默认值,并包含在段描述符中)。



我理解正确吗?

解决方案

TL:DR:您有2个独立的问题,其中1个关于C类型的大小,关于x86-64机器码如何对32位和64位操作数进行编码的另一种方法。编码选择是相当任意的,可能有所不同,但是 int 是32-一点,因为那是编译器开发人员选择的,与机器代码无关。






int 是32位的,因为它仍然是一个有用的大小,它使用的内存带宽/缓存占用空间是 int64_t 的一半。大多数C实现都是针对64位的ISA具有32位 int ,包括主流x86-64的ABI(x86-64 System V和Windows)。在Windows上,即使 long 也是32位类型,大概是为了与为32位编写的代码(假定类型大小)具有源兼容性。



此外,当时的AMD整数乘法器对于32位来说要比64位要快,直到Ryzen为止都是这样。 (第一代AMD64芯片是AMD的K8微体系结构;有关信息,请参见 https://agner.org/optimize/ 指令表。)



在x86-64中使用32位寄存器/指令的优点



x86-64是AMD在2000年设计的,因为AMD64。英特尔致力于Itanium,但没有参与。所有针对x86-64的设计决策都是由AMD架构师做出的。



AMD64在编写32位寄存器时采用隐式零扩展设计,因此32位操作数-size可以有效地使用使用8位和16位模式不会得到部分注册的恶作剧



TL: DR:CPU有充分的理由要以某种方式使32位操作数大小可用,而C类型系统具有易于访问的32位类型。使用 int 是自然的。



如果想要 64位操作数大小,请使用它。 (然后将其描述为 long long [u] int64_t 的C编译器,如果您正在编写C您的asm全局变量或函数原型的声明)。没什么能阻止您的(除了较大的代码大小,您可能以前可能不需要REX前缀)。






所有这些都是与x86-64机器代码如何编码32位操作数大小完全不同的问题。



AMD选择了使32位



他们可以采用另一种方法,使64位操作数大小为默认值,需要REX。 W = 0会将其设置为32,或者 0x66 操作数大小将其设置为16。这可能导致较小的机器代码,而这些代码主要用于处理必须执行的操作



REX前缀也必须完全使用r8..r15。 (甚至是寻址模式的一部分),因此即使在使用默认操作数大小的情况下,需要大量寄存器的代码也经常会在大多数指令上使用REX前缀来查找自己。



很多合作de确实将 int 用于很多东西,因此32位操作数大小并不罕见。如上所述,有时速度更快。 因此,使最快的指令最紧凑(如果避免使用r8d..r15d)是有意义的。



如果相同的操作码以相同的方式在32位和64位模式下无前缀地进行解码,那么也许还可以使解码器硬件更简单。我认为这是AMD进行此设计选择的真正动机。他们当然可以清理很多x86疣,但选择不这样做,可能还希望继续解码,使其更类似于32位模式。



看到它可能很有趣如果您要保存x86-64版本的总体代码大小,默认操作数大小为64位。例如调整编译器并编译一些现有的代码库。您可能想教它的优化器偏爱用于64位而不是32位的传统寄存器RAX..RDI,以尽量减少需要REX前缀的指令的数量。



(许多指令如 add imul reg,reg 可以安全地用于64位操作数大小,即使您只关心低32位,尽管高垃圾也会影响FLAGS结果。)






<回复:注释中的错误信息:与32位机器码兼容与此无关。 64位模式与现有的32位机器代码不二进制兼容;这就是x86-64引入新模式的原因。 64位内核在兼容模式下运行32位二进制文​​件,其中解码的工作方式与32位受保护模式完全相同。



https://en.wikipedia.org/wiki/X86-64#OPMODES 包含有用的模式表,包括长模式(以及64位,32位和16位兼容模式)与旧版模式(如果引导的内核不支持x86-64)。



在64位中位模式,某些操作码是不同的,对于 push / pop 和其他堆栈,操作数大小默认为64位



在这种模式下,32位机器代码将无法正确解码。例如 0x40 在兼容模式下为 inc eax ,但在64位模式下为REX前缀。参见 x86 -32 / x86-64多语言机器代码片段,可以在运行时检测到64位模式?







64位模式解码与解码器中的晶体管共享而不是二进制兼容性非常类似。假定解码器只具有2种模式-相关操作码的默认默认操作数大小(16或32位),例如 03加r,r / m ,而不是3。仅特殊操作符,例如 push / pop 可以保证。 (还请注意, REX.W = 0不会 not 允许您对 push r32 进行编码;操作数的大小保持64位。)



AMD的设计决策似乎集中于尽可能共享解码器晶体管,以防万一AMD64没有流行起来并且他们坚持不使用人来支持它



他们本可以做很多细微的事情来消除令人讨厌的x86遗留的古怪之处,例如制作了 setcc 64位模式下的32位操作数大小指令,以避免首先需要进行异或归零。或CISC烦人,例如标志在零计数移位后保持不变(尽管AMD CPU比Intel处理效率更高,所以也许他们有意留了下来。)



认为微妙的调整可能会损害asm源的移植,或者在短期内使获取编译器后端以支持64位代码生成更加困难。


I am reading Intel doc, vol. 1 and There is a chapter of 3.6.1 Operand Size and Address Size in 64-Bit Mode. There are three prefixes REX.W, operand-size 66 and address-size 67 prefix. And there is mentioned that operand are defaulted to be 32 bit in size. And is only possible to change it with REX.W instruction prefix (after other prefixes) to make it 64 bits long.

I do not know why so, why cannot I used the full 64 bit space for example for int operand? Does it have something to do with sign? Or why is there this restriction? (so, does C unsigned int uses REX.W prefix with a operation on the int (as there is also mentioned, a prefix lasts only for a particular instruction, but not for the whole segment, which should be (the size, either address or operand's) default and contained in segment descriptor).

Do I understand it correctly?

解决方案

TL:DR: you have 2 separate questions. 1 about C type sizes, and another about how x86-64 machine code encodes 32 vs. 64-bit operand-size. The encoding choice is fairly arbitrary and could have been made different. But int is 32-bit because that's what compiler devs chose, nothing to do with machine code.


int is 32-bit because that's still a useful size to use. It uses half the memory bandwidth / cache footprint of int64_t. Most C implementations for 64-bit ISAs have 32-bit int, including both mainstream ABIs for x86-64 (x86-64 System V and Windows). On Windows, even long is a 32-bit type, presumably for source compatibility with code written for 32-bit that made assumptions about type sizes.

Also, AMD's integer multiplier at the time was somewhat faster for 32-bit than 64-bit, and this was the case until Ryzen. (First-gen AMD64 silicon was AMD's K8 microarchitecture; see https://agner.org/optimize/ for instruction tables.)

The advantages of using 32bit registers/instructions in x86-64

x86-64 was designed by AMD in ~2000, as AMD64. Intel was committed to Itanium and not involved; all the design decisions for x86-64 were made by AMD architects.

AMD64 is designed with implicit zero-extension when writing a 32-bit register, so 32-bit operand-size can be used efficiently with none of the partial-register shenanigans you get with 8 and 16-bit mode.

TL:DR: There's good reason for CPUs to want to make 32-bit operand-size available somehow, and for C type systems to have an easily accessible 32-bit type. Using int for that is natural.

If you want 64-bit operand-size, use it. (And then describe it to a C compiler as long long or [u]int64_t, if you're writing C declarations for your asm globals or function prototypes). Nothing's stopping you (except for somewhat larger code size from needing REX prefixes where you might not have before).


All of that is a totally separate question from how x86-64 machine code encodes 32-bit operand-size.

AMD chose to make 32-bit the default and 64-bit operand-size require a REX prefix.

They could have gone the other way and made 64-bit operand-size the default, requiring REX.W=0 to set it to 32, or 0x66 operand-size to set it to 16. That might have led to smaller machine code for code that mostly manipulates things that have to be 64-bit anyway (usually pointers), if it didn't need r8..r15.

A REX prefix is also required to use r8..r15 at all (even as part of an addressing mode), so code that needs lots of registers often finds itself using a REX prefix on most instructions anyway, even when using the default operand-size.

A lot of code does use int for a lot of stuff, so 32-bit operand-size is not rare. And as noted above, it's sometimes faster. So it kind of makes sense to make the fastest instructions the most compact (if you avoid r8d..r15d).

It also maybe lets the decoder hardware be simpler if the same opcode decodes the same way with no prefixes in 32 and 64-bit mode. I think this was AMD's real motivation for this design choice. They certainly could have cleaned up a lot of x86 warts but chose not to, probably also to keep decoding more similar to 32-bit mode.

It might be interesting to see if you'd save overall code size for a version of x86-64 with a default operand-size of 64-bit. e.g. tweak a compiler and compile some existing codebases. You'd want to teach its optimizer to favour the legacy registers RAX..RDI for 64-bit operands instead of 32-bit, though, to try to minimize the number of instructions that need REX prefixes.

(Many instructions like add or imul reg,reg can safely be used at 64-bit operand-size even if you only care about the low 32, although the high garbage will affect the FLAGS result.)


Re: misinformation in comments: compat with 32-bit machine code has nothing to do with this. 64-bit mode is not binary compatible with existing 32-bit machine code; that's why x86-64 introduced a new mode. 64-bit kernels run 32-bit binaries in compat mode, where decoding works exactly like 32-bit protected mode.

https://en.wikipedia.org/wiki/X86-64#OPMODES has a useful table of modes, including long mode (and 64-bit vs. 32 and 16-bit compat modes) vs. legacy mode (if you boot a kernel that's not x86-64 aware).

In 64-bit mode some opcodes are different, and operand-size default to 64-bit for push/pop and other stack instruction opcodes.

32-bit machine code would decode incorrectly in that mode. e.g. 0x40 is inc eax in compat mode but a REX prefix in 64-bit mode. See x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time? for an example.

Also

64-bit mode decoding mostly similarly is a matter of sharing transistors in the decoders, not binary compatibility. Presumably it's easier for the decoders to only have 2 mode-dependent default operand sizes (16 or 32-bit) for opcodes like 03 add r, r/m, not 3. Only special-casing for opcodes like push/pop that warrant it. (Also note that REX.W=0 does not let you encode push r32; the operand-size stays at 64-bit.)

AMD's design decisions seem to have been focused on sharing decoder transistors as much as possible, perhaps in case AMD64 didn't catch on and they were stuck supporting it without people using it.

They could have done lots of subtle things that removed annoying legacy quirks of x86, for example made setcc a 32-bit operand-size instruction in 64-bit mode to avoid needing xor-zeroing first. Or CISC annoyances like flags staying unchanged after zero-count shifts (although AMD CPUs handle that more efficiently than Intel, so maybe they intentionally left that in.)

Or maybe they thought that subtle tweaks could hurt asm source porting, or in the short term make it harder to get compiler back-ends to support 64-bit code-gen.

这篇关于为什么在64模式下默认操作数大小为32位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆