为什么我们不能将64位立即数移到内存中? [英] why we can't move a 64-bit immediate value to memory?
问题描述
首先,我对movq
和movabsq
之间的区别有点困惑,我的课本上写着:
First I am a little bit confused with the differences between movq
and movabsq
, my text book says:
常规的movq
指令只能包含可以用32位二进制补码表示的直接源操作数.然后将此值符号扩展以生成目标的64位值. movabsq
指令可以具有一个任意的64位立即数作为其源操作数,并且只能具有一个寄存器作为目标.
The regular movq
instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. This value is then sign extended to produce the 64-bit value for the destination. The movabsq
instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.
对此我有两个问题.
movq
指令只能具有可以用32位二进制补码表示的直接源操作数.
The movq
instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers.
所以这意味着我们做不到
so it means that we can't do
movq $0x123456789abcdef, %rbp
我们必须做:
movabsq $0x123456789abcdef, %rbp
,但是为什么movq
设计为不适用于64位立即数,这确实违反了q
(四字)的目的,而我们为此需要另外一个movabsq
,是'麻烦吗?
but why movq
is designed to not work for 64 bits immediate value, which is really against the purpose of q
(quard word), and we need to have another movabsq
just for this purpose, isn't that hassle?
由于movabsq
的目的地必须是寄存器,而不是内存,所以我们不能将64位立即数以以下方式移动到内存中:
Since the destination of movabsq
has to be a register, not memory, so we can't move a 64-bit immediate value to memory as:
movabsq $0x123456789abcdef, (%rax)
但是有一种解决方法:
movabsq $0x123456789abcdef, %rbx
movq %rbx, (%rax) // the source operand is a register, not immediate constant, and the destination of movq can be memory
那为什么设计规则会使事情变得更困难?
so why the rule is designed to make things harder?
推荐答案
是的,不像-1
又名0xFFFFFFFFFFFFFFFF
,先移到寄存器然后到内存中查找不适合符号扩展的32位的立即数. . 为什么部分是一个有趣的问题,
Yes, mov to a register then to memory for immediates that won't fit in a sign-extended 32-bit, unlike -1
aka 0xFFFFFFFFFFFFFFFF
. The why part is interesting question, though:
请记住,asm只允许您在机器代码中完成 .因此,这实际上是有关ISA设计的问题.这样的决定通常涉及硬件容易解码的内容以及编码效率的考虑. (在很少使用的指令上使用操作码会很糟糕.)
Remember that asm only lets you do what's possible in machine code. Thus it's really a question about ISA design. Such decisions often involve what's easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)
它并非旨在使事情变得更难,它旨在不需要mov
的任何新操作码.并且还将64位立即数限制为一种特殊的指令格式. mov
是唯一可以 ever 完全使用64位立即数 (或64位绝对地址,用于加载/存储AL/AX/)的指令. EAX/RAX).
It's not designed to make things harder, it's designed to not need any new opcodes for mov
. And also to limit 64-bit immediates to one special instruction format. mov
is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).
查看 mov
形式的英特尔手册(请注意,它使用了英特尔语法,目标优先,所以我的答案也将如此.)我还在 x86-64中movq和movabsq之间的差异.
Check out Intel's manual for the forms of mov
(note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to Difference between movq and movabsq in x86-64.
允许imm64和ModR/M寻址模式一起使用还可以很容易地达到指令长度的15字节上限. REX +操作码+ imm64为10个字节,而ModRM + SIB + disp32为6.因此,即使存在mov r/m64, imm64
的操作码,mov [rdi + rax*8 + 1234], imm64
也将不可编码.
Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64
would not be encodeable even if there was an opcode for mov r/m64, imm64
.
并假设它们重新利用了通过使某些指令在64位模式下无效(例如aaa
)而释放的1字节操作码之一,这对于解码器可能不方便(并且指令长度为pre-解码器),因为在其他模式下,这些操作码不会占用ModRM字节或立即数.
And that's assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa
), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don't take a ModRM byte or an immediate.
movq
用于具有正常ModRM字节的mov
形式,以允许将任意寻址模式用作目标.(或作为movq r64, r/m64
的源). AMD选择将它们的立即数保留为32位,与32位操作数大小 1 相同.
movq
is for the forms of mov
with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64
). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size1.
这些mov
形式与其他指令(例如add
)具有相同的指令格式. 为便于解码,这意味着REX前缀不会更改这些操作码的指令长度.当寻址模式为可变长度时,指令长度解码已经足够困难.
These forms of mov
are the same instruction format as other instructions like add
. For ease of decoding, this means a REX prefix doesn't change the instruction-length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.
因此,movq
是64位操作数大小,但其他方面是相同的指令格式mov r/m64, imm32
(成为正负号扩展形式,与其他每条仅具有一个立即数形式的指令相同),以及mov r/m64, r64
或mov r64, r/m64
.
So movq
is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32
(becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64
or mov r64, r/m64
.
movabs
是现有的no-ModRM缩写形式mov reg, imm32
的64位形式.这已经是一种特殊情况(由于采用了非现代编码,寄存器号从操作码字节的低3位开始).较小的正常数可以仅使用32位操作数大小来隐式零扩展到64位,而不会降低效率(例如在32或64位模式下为5字节mov eax, 123
/AT& T mov $123, %eax
).拥有64位绝对值mov
很有用,因此AMD做到了.
movabs
is the 64-bit form of the existing no-ModRM short form mov reg, imm32
. This one is already a special case (because of the no-modrm encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand-size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123
/ AT&T mov $123, %eax
in 32 or 64-bit mode). And having a 64-bit absolute mov
is useful so it makes sense AMD did that.
由于没有ModRM字节,因此只能对寄存器目标进行编码.要添加可以采用内存操作数的形式,将需要完全不同的操作码.
Since there's no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.
从一个POV中,您将得到一个完全包含64位立即数 的mov
;诸如AArch64(具有固定宽度的32位指令)之类的RISC ISA需要更多类似于4条指令,才能将64位值存储到寄存器中. (除非它是重复的位模式; AArch64实际上非常酷.不同于早期的RISC,例如MIPS64或PowerPC64)
From one POV, be grateful you get a mov
with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it's a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)
如果AMD64将为mov
引入新的操作码,则mov r/m, sign_extended_imm8
在节省代码大小方面将非常有用.编译器发出多个0
立即数.在寄存器中放入一个非零的小数字是很常见的,这将使mov eax, 123
为3字节指令(从5减少),而使mov rax, -123
为4字节指令(从7减少).还将使寄存器清零而不会破坏FLAGS 3个字节.
If AMD64 was going to introduce a new opcode for mov
, mov r/m, sign_extended_imm8
would be vastly more useful to save code-size. It's not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0
instructions to zero a local array or struct, each one containing a 4-byte 0
immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123
a 3-byte instruction (down from 5), and mov rax, -123
a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.
允许mov
imm64进入内存将很少有用,以至于AMD认为不值得使解码器变得更加复杂.在这种情况下,我同意他们的观点,但是AMD在添加新的操作码方面非常保守.这么多错过了清理x86疣的机会,比如扩大setcc
会很好.但是我认为AMD不确定AMD64是否会流行,并且不想被卡住,如果人们不使用它,就需要大量额外的晶体管/功能来支持该功能.
Allowing mov
imm64 to memory would be useful rarely enough that AMD decided it wasn't worth making the decoders more complex. In this case I agree with them, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc
would have been nice. But I think AMD wasn't sure AMD64 would catch on, and didn't want to be stuck needing a lot of extra transistors / power to support a feature if people didn't use it.
脚注1 :
一般来说,对于代码大小,32位立即数显然是一个不错的选择.想要add
立即到达+ -2GiB范围之外的值的情况非常罕见.对于AND
之类的按位操作可能很有用,但是对于设置/清除/翻转单个位,bts
/btr
/btc
指令很好(将位位置作为8位立即数,而不需要口罩).您不希望sub rsp, 1024
是一个11字节的指令. 7已经够糟糕了.
Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code-size. It's very rare to want to add
an immediate to something that's outside the +-2GiB range. It could be useful for bitwise stuff like AND
, but for setting/clearing/flipping a single bit the bts
/ btr
/ btc
instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don't want sub rsp, 1024
to be an 11-byte instruction; 7 is already bad enough.
在设计AMD64时(2000年代初期),具有uop缓存的CPU并不是什么问题. (确实存在带跟踪缓存的Intel P4,但是事后看来,这是一个错误.)指令获取/解码发生在最多16个字节的块中,因此拥有一条将近16个字节的指令对于前端比movabs $imm64, %reg
.
At the time AMD64 was designed (early 2000s), CPUs with uop caches weren't a thing. (Intel P4 with a trace cache did exist, but in hindsight it was regarded as a mistake.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that's nearly 16 bytes isn't much better for the front-end than movabs $imm64, %reg
.
当然,如果后端跟不上前端,则可以通过在各个阶段之间进行缓冲来隐藏只有1条指令在该周期解码的气泡.
Of course if the back-end isn't keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.
跟踪一条指令的大量数据也是一个问题. CPU必须将该数据放在某个地方,如果在寻址模式下有64位立即数 和32位位移,则该位很多. 对于imm32 + disp32,一条指令通常最多需要64位空间.
Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there's a 64-bit immediate and a 32-bit displacement in the addressing mode, that's a lot of bits. Normally an instruction needs at most 64-bits of space for an imm32 + a disp32.
顺便说一句,对于大多数使用RAX和立即数的操作,有特殊的无现代操作码. (x86-64源自8086,其中AX/AL更特殊,请参见
BTW, there are special no-modrm opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32
forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.
这篇关于为什么我们不能将64位立即数移到内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!