为什么我们不能将 64 位立即数移动到内存中? [英] why we can't move a 64-bit immediate value to memory?

查看:36
本文介绍了为什么我们不能将 64 位立即数移动到内存中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先我对 movqmovabsq 之间的区别有点困惑,我的教科书说:

First I am a little bit confused with the differences between movq and movabsq, my text book says:

常规 movq 指令只能具有可以表示为 32 位二进制补码的直接源操作数.然后对该值进行符号扩展以生成目标的 64 位值.movabsq 指令可以将任意 64 位立即数作为其源操作数,并且只能将寄存器作为目标.

The regular movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. This value is then sign extended to produce the 64-bit value for the destination. The movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

我有两个问题.

movq 指令只能有可以表示为 32 位二进制补码的立即数源操作数.

The movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers.

所以这意味着我们做不到

so it means that we can't do

movq    $0x123456789abcdef, %rbp

我们必须这样做:

movabsq $0x123456789abcdef, %rbp

但是为什么 movq 被设计成不适用于 64 位立即数,这真的违背了 q(四分字)的目的,我们需要另一个movabsq 就是为了这个目的,是不是很麻烦?

but why movq is designed to not work for 64 bits immediate value, which is really against the purpose of q (quard word), and we need to have another movabsq just for this purpose, isn't that hassle?

由于 movabsq 的目标必须是寄存器,而不是内存,所以我们不能将 64 位立即数移动到内存中:

Since the destination of movabsq has to be a register, not memory, so we can't move a 64-bit immediate value to memory as:

movabsq $0x123456789abcdef, (%rax)

但有一个解决方法:

movabsq $0x123456789abcdef, %rbx
movq    %rbx, (%rax)   // the source operand is a register, not immediate constant, and the destination of movq can be memory

那么为什么规则旨在让事情变得更难?

so why the rule is designed to make things harder?

推荐答案

是的,移动到寄存器,然后移动到内存,用于不适合符号扩展的 32 位的立即数,与 -1 不同-1 又名<代码>0xFFFFFFFFFFFFFFFF.why 部分是个有趣的问题:

Yes, mov to a register then to memory for immediates that won't fit in a sign-extended 32-bit, unlike -1 aka 0xFFFFFFFFFFFFFFFF. The why part is interesting question, though:

请记住,asm 只允许您执行机器代码.因此,这实际上是一个关于 ISA 设计的问题.此类决策通常涉及硬件易于解码的内容,以及编码效率方面的考虑.(在很少使用的指令上使用操作码会很糟糕.)

Remember that asm only lets you do what's possible in machine code. Thus it's really a question about ISA design. Such decisions often involve what's easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)

它不是为了让事情变得更难,而是为了不需要任何新的 mov 操作码. 并且还将 64 位立即数限制为一种特殊的指令格式.mov 是唯一可以永远使用 64 位立即数(或 64 位绝对地址,用于加载/存储)的指令AL/AX/EAX/RAX).

It's not designed to make things harder, it's designed to not need any new opcodes for mov. And also to limit 64-bit immediates to one special instruction format. mov is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).

查看 英特尔的手册,了解 mov 的形式(请注意,它使用 Intel 语法,目标优先,我的答案也是如此.)我还在 x86-64 中 movq 和 movabsq 之间的区别,@MargaretBloom 对 x86-64 AT&T 指令 movq 和 movabsq 有什么区别?.

Check out Intel's manual for the forms of mov (note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to What's the difference between the x86-64 AT&T instructions movq and movabsq?.

允许 imm64 和 ModR/M 寻址模式也可以很容易地达到 15 字节的指令长度上限,例如REX + opcode + imm64 是 10 字节,ModRM+SIB+disp32 是 6.所以 mov [rdi + rax*8 + 1234], imm64 将无法编码,即使有 <代码>mov r/m64, imm64.

Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64 would not be encodeable even if there was an opcode for mov r/m64, imm64.

并且假设他们重新利用了通过使某些指令在 64 位模式下无效(例如 aaa)而释放的 1 字节操作码之一,这可能对解码器不方便(和指令长度预解码器),因为在其他模式下,这些操作码不采用 ModRM 字节或立即数.

And that's assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don't take a ModRM byte or an immediate.

movq 用于 mov 的形式,带有一个普通的 ModRM 字节,允许任意寻址模式作为目标.(或作为movq r64, r/m64 的源代码).AMD 选择将这些立即数保留为​​ 32 位,与 32 位操作数大小相同1.

movq is for the forms of mov with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size1.

这些形式的mov 与其他指令如add 的指令格式相同.为了便于解码,这意味着 REX 前缀不会改变这些操作码的指令长度.当寻址模式为可变长度时,指令长度解码已经足够困难了.

These forms of mov are the same instruction format as other instructions like add. For ease of decoding, this means a REX prefix doesn't change the instruction-length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.

所以 movq 是 64 位操作数大小,但其他指令格式相同 mov r/m64, imm32(成为符号扩展立即形式,与其他只有一种直接形式的指令),以及 mov r/m64, r64mov r/m64, r/m64.

So movq is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32 (becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64 or mov r64, r/m64.

movabs 是现有 no-ModRM 短格式 mov reg, imm32 的 64 位格式.这已经是一种特殊情况(因为 no-modrm 编码,寄存器编号来自操作码字节的低 3 位).小的正常量可以只使用 32 位操作数大小隐式零扩展到 64 位而不会损失效率(如 5 字节 mov eax, 123/AT&T mov$123, %eax 在 32 或 64 位模式下).拥有 64 位绝对 mov 很有用,因此 AMD 这样做是有道理的.

movabs is the 64-bit form of the existing no-ModRM short form mov reg, imm32. This one is already a special case (because of the no-modrm encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand-size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123 / AT&T mov $123, %eax in 32 or 64-bit mode). And having a 64-bit absolute mov is useful so it makes sense AMD did that.

由于没有 ModRM 字节,它只能编码一个寄存器目的地.添加一个可以接受内存操作数的表单需要完全不同的操作码.

Since there's no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.

从一个 POV 获得一个 mov 与 64 位立即数;像 AArch64(具有固定宽度的 32 位指令)这样的 RISC ISA 需要更多的 4 条指令才能将 64 位值放入寄存器.(除非它是一个重复的位模式;AArch64 实际上很酷.不像早期的 RISC,如 MIPS64 或 PowerPC64)

From one POV, be grateful you get a mov with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it's a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)

如果 AMD64 打算为 mov 引入一个新的操作码,mov r/m, sign_extended_imm8 将大大有助于节省代码大小. 编译器发出多个 mov qword ptr [rsp+8], 0 指令来将本地数组或结构归零的情况并不罕见,每个指令都包含一个 4 字节的 0 立即.将一个非零的小数放在寄存器中是相当常见的,并且会使 mov eax, 123 成为 3 字节指令(从 5 减少),并且 mov rax, -123 4 字节指令(从 7 减少).它还可以在不破坏 FLAGS 3 个字节的情况下将寄存器清零.

If AMD64 was going to introduce a new opcode for mov, mov r/m, sign_extended_imm8 would be vastly more useful to save code-size. It's not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0 instructions to zero a local array or struct, each one containing a 4-byte 0 immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123 a 3-byte instruction (down from 5), and mov rax, -123 a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.

允许 mov imm64 进入内存很少有用,以至于 AMD 认为不值得让解码器变得更复杂.在这种情况下,我同意他们的观点,但 AMD 在添加新操作码方面非常保守.很多错过清理 x86 问题的机会,比如扩大 setcc 本来就不错.但我认为 AMD 不确定 AMD64 会流行起来,并且不想被困在需要大量额外晶体管/电源来支持人们不使用它的功能.

Allowing mov imm64 to memory would be useful rarely enough that AMD decided it wasn't worth making the decoders more complex. In this case I agree with them, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc would have been nice. But I think AMD wasn't sure AMD64 would catch on, and didn't want to be stuck needing a lot of extra transistors / power to support a feature if people didn't use it.

脚注 1:
一般来说,32 位立即数显然是代码大小的一个很好的决定.想要立即 add 到 +-2GiB 范围之外的东西是非常罕见的.它可能对 AND 之类的按位内容很有用,但对于设置/清除/翻转单个位 bts/btr/btc 指令很好(将位位置作为 8 位立即数,而不是需要掩码).您不希望 sub rsp, 1024 成为 11 字节指令;7已经够糟糕了.

Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code-size. It's very rare to want to add an immediate to something that's outside the +-2GiB range. It could be useful for bitwise stuff like AND, but for setting/clearing/flipping a single bit the bts / btr / btc instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don't want sub rsp, 1024 to be an 11-byte instruction; 7 is already bad enough.

在设计 AMD64 时(2000 年代初),还没有带 uop 缓存的 CPU.(具有跟踪缓存的英特尔 P4 确实存在,但事后看来它被认为是一个错误.)指令提取/解码以高达 16 字节的块进行,因此拥有一条接近 16 字节的指令对于前端比movabs $imm64, %reg.

At the time AMD64 was designed (early 2000s), CPUs with uop caches weren't a thing. (Intel P4 with a trace cache did exist, but in hindsight it was regarded as a mistake.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that's nearly 16 bytes isn't much better for the front-end than movabs $imm64, %reg.

当然,如果后端跟不上前端,那么可以通过在阶段之间进行缓冲来隐藏这个周期仅解码 1 条指令的气泡.

Of course if the back-end isn't keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.

跟踪一条指令的这么多数据也是一个问题.CPU 必须将这些数据放在某处,如果在寻址模式下有 64 位立即数 32 位位移,那就是很多位.通常一条指令最多需要 64 位空间用于 imm32 + disp32.

Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there's a 64-bit immediate and a 32-bit displacement in the addressing mode, that's a lot of bits. Normally an instruction needs at most 64-bits of space for an imm32 + a disp32.

顺便说一句,对于大多数带有 RAX 和立即数的操作,都有特殊的 no-modrm 操作码.(x86-64 是从 8086 进化而来的,其中 AX/AL 更为特殊,参见 this 了解更多历史和解释).对于那些没有 ModRM 的 add/sub/cmp/and/or/xor/... rax, sign_extended_imm32 表单来说,使用完整的 imm64 将是一个合理的设计.RAX 最常见的情况是,立即数使用 8 位符号扩展的立即数(-128..127),无论如何都不是这种形式,它只为需要 4 字节立即数的指令节省 1 个字节.不过,如果您确实需要一个 8 字节的常量,那么将它放在寄存器或内存中以供重用会比在循环中执行 10 字节和-imm64 更好.

BTW, there are special no-modrm opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32 forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.

这篇关于为什么我们不能将 64 位立即数移动到内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆