x86汇编16位vs 8位立即操作数编码 [英] x86 assembly 16 bit vs 8 bit immediate operand encoding

查看:334
本文介绍了x86汇编16位vs 8位立即操作数编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写自己的汇编器并尝试对ADC指令进行编码,我对立即数有疑问,尤其是在将8位值添加到AX寄存器中时.

当添加16位值时:adc ax, 0xff33被编码为15 33 ff,这是正确的. 但是adc ax, 0x33是否被编码为15 33 00会很重要吗?

Nasm将此编码为83 d0 33,这显然是正确的,但是我的方法也正确吗?

解决方案

x86通常具有一种以上的有效指令编码方式.例如大多数op reg, reg指令可以选择通过op r/m, regop reg, r/m操作码进行编码.

是的,通常,您希望汇编程序始终选择指令的最短编码. NASM甚至将x86-64的mov rax, 1(对于mov r64, sign_extended_imm32为7个字节)优化为mov eax, 1(为5个字节),更改了操作数大小以使用零扩展来写入32位寄存器,而不是显式符号.扩展为32位立即数.

在可用时使用sign-extended-imm8编码总是很好的

对于16位来说,它的长度是相等的,但是对于32位操作数大小,它的长度要短一些,因此它简化了代码,始终选择imm8.

操作数大小为32位时,op eax, imm32是5个字节,而op r/m32, imm8仍然是3个字节. (不计算设置操作数大小或其他内容所需的任何前缀;这两个前缀将是相同的.)

imm8编码的性能优势

如果需要操作数大小的前缀(例如,在adc ax, 0x33的32位模式下),则将adc ax/eax/rax, imm16/32/32编码与操作数大小的前缀一起使用将在Intel CPU上创建 LCP停顿 >(更改长度的前缀表示前缀会更改指令的 rest 的长度.对于imm8编码,不会发生这种情况,因为它仍然是(前缀)+操作码+ modrm + imm8,无论操作数大小.

请参见 Agner Fog的microarch.pdf 和其他性能链接. stackoverflow.com/tags/x86/info">x86标签Wiki .另请参见 x86指令编码如何选择操作码,它是除了adc是特例之外.


adc/sbb的特定情况下,避免使用ax, imm16编码还有另一个优势:请参阅

如果您要设计一种新的asm语法,则可以考虑允许使用override关键字对编码进行更多控制.对于现有设计,请查看NASM的strictnosplit关键字,以及GAS的{vex2}{vex3}{disp32}等前缀"

I'm writing my own assembler and trying to encode the ADC instruction, I have a question about immediate values, especially when adding 8-bit value into the AX register.

When adding 16-bit value: adc ax, 0xff33 gets encoded as 15 33 ff which is correct. But would it matter if adc ax, 0x33 gets encoded as 15 33 00?

Nasm encodes this into 83 d0 33 which is obviously correct, but is my approach correct as well?

解决方案

It's common for x86 to have more than 1 valid way of encoding an instruction. e.g. most op reg, reg instructions have a choice of encoding via the op r/m, reg or the op reg, r/m opcode.

And yes, normally you want an assembler to always pick the shortest encoding for an instruction. NASM even optimizes mov rax, 1 (7 bytes for mov r64, sign_extended_imm32) into mov eax, 1 (5 bytes) for x86-64, changing the operand-size to use the zero-extension from writing a 32-bit register instead of explicit sign-extension of a 32-bit immediate.

Using the sign-extended-imm8 encoding when available is always good

It's equal length for 16-bit, but shorter for 32-bit operand-size, so it simplifies your code to always choose imm8.

With operand-size of 32-bit, op eax, imm32 is 5 bytes, vs. op r/m32, imm8 still being 3 bytes. (Not counting any prefixes needed to set operand-size or other things; those will be the same for both.)

Performance advantages of the imm8 encoding

If an operand-size prefix is requires (e.g. in 32-bit mode for adc ax, 0x33), using the adc ax/eax/rax, imm16/32/32 encoding with an operand-size prefix will create an LCP stall on Intel CPUs (Length-Changing Prefix means the prefix changes the length of the rest of the instruction. This doesn't happen for the imm8 encoding because it's still (prefix) + opcode + modrm + imm8 regardless of the operand-size.

See Agner Fog's microarch.pdf and other performance links in the x86 tag wiki. See also x86 instruction encoding how to choose opcode which is a duplicate of this, except for the fact that adc is a special case.


In the specific case of adc/sbb, there is another advantage to avoiding the ax, imm16 encoding: See Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? On Sandybridge through Haswell, adc ax, 0 is special-cased as a single-uop instruction, instead of the normal 2 for a 3-input uop (ax, flags, immediate).

But this special casing doesn't work for the no-ModRM short form encodings, so the 3-byte adc ax, imm16 still decodes to 2 uops. Only the decoder for the imm8 form checks if the immediate is zero before decoding to a single uop. (And it still doesn't work for adc al, imm8.)

So always choosing the sign-extended-imm8 whenever possible is optimal for this, too, even in 16-bit mode where no operand-size prefix would be required for adc ax,0 and thus the LCP-stall issue wouldn't happen.


Most assemblers don't provide an override to avoid the no-ModRM short form. When they were designed, there wasn't a performance use-case other than intentionally lengthening instructions to get alignment without adding NOPs before the top of a loop or other branch target: What methods can be used to efficiently extend instruction length on modern x86?

If you're designing a new flavour of asm syntax you might consider allowing more control of the encoding with override keywords. For existing designs, check out NASM's strict and nosplit keywords, and GAS's {vex2}, {vex3}, {disp32} and so on "prefixes"

这篇关于x86汇编16位vs 8位立即操作数编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆