为什么我们不能将64位立即数移到内存中? [英] why we can't move a 64-bit immediate value to memory?

查看:439
本文介绍了为什么我们不能将64位立即数移到内存中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我对movqmovabsq之间的区别有点困惑,我的课本上写着:

First I am a little bit confused with the differences between movq and movabsq, my text book says:

常规的movq指令只能包含可以用32位二进制补码表示的直接源操作数.然后将此值符号扩展以生成目标的64位值. movabsq指令可以具有一个任意的64位立即数作为其源操作数,并且只能具有一个寄存器作为目标.

The regular movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. This value is then sign extended to produce the 64-bit value for the destination. The movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

对此我有两个问题.

movq指令只能具有可以用32位二进制补码表示的直接源操作数.

The movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers.

所以这意味着我们做不到

so it means that we can't do

movq    $0x123456789abcdef, %rbp

我们必须做:

movabsq $0x123456789abcdef, %rbp

,但是为什么movq设计为不适用于64位立即数,这确实违反了q(四字)的目的,而我们为此需要另外一个movabsq,是'麻烦吗?

but why movq is designed to not work for 64 bits immediate value, which is really against the purpose of q (quard word), and we need to have another movabsq just for this purpose, isn't that hassle?

由于movabsq的目的地必须是寄存器,而不是内存,所以我们不能将64位立即数以以下方式移动到内存中:

Since the destination of movabsq has to be a register, not memory, so we can't move a 64-bit immediate value to memory as:

movabsq $0x123456789abcdef, (%rax)

但是有一种解决方法:

movabsq $0x123456789abcdef, %rbx
movq    %rbx, (%rax)   // the source operand is a register, not immediate constant, and the destination of movq can be memory

那为什么设计规则会使事情变得更困难?

so why the rule is designed to make things harder?

推荐答案

是的,不像-1又名0xFFFFFFFFFFFFFFFF,先移到寄存器然后到内存中查找不适合符号扩展的32位的立即数. . 为什么部分是一个有趣的问题,

Yes, mov to a register then to memory for immediates that won't fit in a sign-extended 32-bit, unlike -1 aka 0xFFFFFFFFFFFFFFFF. The why part is interesting question, though:

请记住,asm只允许您在机器代码中完成 .因此,这实际上是有关ISA设计的问题.这样的决定通常涉及硬件容易解码的内容以及编码效率的考虑. (在很少使用的指令上使用操作码会很糟糕.)

Remember that asm only lets you do what's possible in machine code. Thus it's really a question about ISA design. Such decisions often involve what's easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)

它并非旨在使事情变得更难,它旨在不需要mov的任何新操作码.并且还将64位立即数限制为一种特殊的指令格式. mov是唯一可以 ever 完全使用64位立即数 (或64位绝对地址,用于加载/存储AL/AX/)的指令. EAX/RAX).

It's not designed to make things harder, it's designed to not need any new opcodes for mov. And also to limit 64-bit immediates to one special instruction format. mov is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).

查看 mov 形式的英特尔手册(请注意,它使用了英特尔语法,目标优先,所以我的答案也将如此.)我还在 x86-64中movq和movabsq之间的差异.

Check out Intel's manual for the forms of mov (note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to Difference between movq and movabsq in x86-64.

允许imm64和ModR/M寻址模式一起使用还可以很容易地达到指令长度的15字节上限. REX +操作码+ imm64为10个字节,而ModRM + SIB + disp32为6.因此,即使存在mov r/m64, imm64的操作码,mov [rdi + rax*8 + 1234], imm64也将不可编码.

Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64 would not be encodeable even if there was an opcode for mov r/m64, imm64.

并假设它们重新利用了通过使某些指令在64位模式下无效(例如aaa)而释放的1字节操作码之一,这对于解码器可能不方便(并且指令长度为pre-解码器),因为在其他模式下,这些操作码不会占用ModRM字节或立即数.

And that's assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don't take a ModRM byte or an immediate.

movq用于具有正常ModRM字节的mov形式,以允许将任意寻址模式用作目标.(或作为movq r64, r/m64的源). AMD选择将它们的立即数保留为​​32位,与32位操作数大小 1 相同.

movq is for the forms of mov with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size1.

这些mov形式与其他指令(例如add)具有相同的指令格式. 为便于解码,这意味着REX前缀不会更改这些操作码的指令长度.当寻址模式为可变长度时,指令长度解码已经足够困难.

These forms of mov are the same instruction format as other instructions like add. For ease of decoding, this means a REX prefix doesn't change the instruction-length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.

因此,movq是64位操作数大小,但其他方面是相同的指令格式mov r/m64, imm32(成为正负号扩展形式,与其他每条仅具有一个立即数形式的指令相同),以及mov r/m64, r64mov r64, r/m64.

So movq is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32 (becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64 or mov r64, r/m64.

movabs是现有的no-ModRM缩写形式mov reg, imm32 的64位形式.这已经是一种特殊情况(由于采用了非现代编码,寄存器号从操作码字节的低3位开始).较小的正常数可以仅使用32位操作数大小来隐式零扩展到64位,而不会降低效率(例如在32或64位模式下为5字节mov eax, 123/AT& T mov $123, %eax).拥有64位绝对值mov很有用,因此AMD做到了.

movabs is the 64-bit form of the existing no-ModRM short form mov reg, imm32. This one is already a special case (because of the no-modrm encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand-size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123 / AT&T mov $123, %eax in 32 or 64-bit mode). And having a 64-bit absolute mov is useful so it makes sense AMD did that.

由于没有ModRM字节,因此只能对寄存器目标进行编码.要添加可以采用内存操作数的形式,将需要完全不同的操作码.

Since there's no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.

从一个POV中,您将得到一个完全包含64位立即数 mov;诸如AArch64(具有固定宽度的32位指令)之类的RISC ISA需要更多类似于4条指令,才能将64位值存储到寄存器中. (除非它是重复的位模式; AArch64实际上非常酷.不同于早期的RISC,例如MIPS64或PowerPC64)

From one POV, be grateful you get a mov with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it's a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)

如果AMD64将为mov引入新的操作码,则mov r/m, sign_extended_imm8在节省代码大小方面将非常有用.编译器发出多个将本地数组或结构置零的指令,每个数组或结构都包含一个4字节的0立即数.在寄存器中放入一个非零的小数字是很常见的,这将使mov eax, 123为3字节指令(从5减少),而使mov rax, -123为4字节指令(从7减少).还将使寄存器清零而不会破坏FLAGS 3个字节.

If AMD64 was going to introduce a new opcode for mov, mov r/m, sign_extended_imm8 would be vastly more useful to save code-size. It's not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0 instructions to zero a local array or struct, each one containing a 4-byte 0 immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123 a 3-byte instruction (down from 5), and mov rax, -123 a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.

允许mov imm64进入内存将很少有用,以至于AMD认为不值得使解码器变得更加复杂.在这种情况下,我同意他们的观点,但是AMD在添加新的操作码方面非常保守.这么多错过了清理x86疣的机会,比如扩大setcc会很好.但是我认为AMD不确定AMD64是否会流行,并且不想被卡住,如果人们不使用它,就需要大量额外的晶体管/功能来支持该功能.

Allowing mov imm64 to memory would be useful rarely enough that AMD decided it wasn't worth making the decoders more complex. In this case I agree with them, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc would have been nice. But I think AMD wasn't sure AMD64 would catch on, and didn't want to be stuck needing a lot of extra transistors / power to support a feature if people didn't use it.

脚注1 :
一般来说,对于代码大小,32位立即数显然是一个不错的选择.想要add立即到达+ -2GiB范围之外的值的情况非常罕见.对于AND之类的按位操作可能很有用,但是对于设置/清除/翻转单个位,bts/btr/btc指令很好(将位位置作为8位立即数,而不需要口罩).您不希望sub rsp, 1024是一个11字节的指令. 7已经够糟糕了.

Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code-size. It's very rare to want to add an immediate to something that's outside the +-2GiB range. It could be useful for bitwise stuff like AND, but for setting/clearing/flipping a single bit the bts / btr / btc instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don't want sub rsp, 1024 to be an 11-byte instruction; 7 is already bad enough.

在设计AMD64时(2000年代初期),具有uop缓存的CPU并不是什么问题. (确实存在带跟踪缓存的Intel P4,但是事后看来,这是一个错误.)指令获取/解码发生在最多16个字节的块中,因此拥有一条将近16个字节的指令对于前端比movabs $imm64, %reg.

At the time AMD64 was designed (early 2000s), CPUs with uop caches weren't a thing. (Intel P4 with a trace cache did exist, but in hindsight it was regarded as a mistake.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that's nearly 16 bytes isn't much better for the front-end than movabs $imm64, %reg.

当然,如果后端跟不上前端,则可以通过在各个阶段之间进行缓冲来隐藏只有1条指令在该周期解码的气泡.

Of course if the back-end isn't keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.

跟踪一条指令的大量数据也是一个问题. CPU必须将该数据放在某个地方,如果在寻址模式下有64位立即数 和32位位移,则该位很多. 对于imm32 + disp32,一条指令通常最多需要64位空间.

Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there's a 64-bit immediate and a 32-bit displacement in the addressing mode, that's a lot of bits. Normally an instruction needs at most 64-bits of space for an imm32 + a disp32.

顺便说一句,对于大多数使用RAX和立即数的操作,有特殊的无现代操作码. (x86-64源自8086,其中AX/AL更特殊,请参见

BTW, there are special no-modrm opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32 forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.

这篇关于为什么我们不能将64位立即数移到内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆