一个操作码字节如何根据“寄存器/操作码"解码为不同的指令.场地?那是什么? [英] How does one opcode byte decode to different instructions depending on the "register/opcode" field? What is that?
问题描述
如何确定将字节数组转换为机器代码的内容?
How can I determine what an array of bytes will translate into in machine code?
我知道,如果我在开始时看到0f是2字节指令,但是我在x64调试器中看到了其他前缀和一些反汇编,则看到了奇怪的交互,例如48 83 C4 38,并且可以在操作码参考上看到48说操作数是64个字节.
I understand that if I see 0f at the start it's a 2 byte instruction, but I see other prefixes and in some disassembly in my x64 debugger I see weird interactions like 48 83 C4 38 and I can see on the opcode reference that 48 says the operand is 64 bytes.
但是83
说它可以是7个不同的指令,具体取决于称为寄存器/操作码字段"的字段.什么?
But 83
says it can be 7 different instructions depending on a field called "register/opcode field" ..what?
有人可以解释处理器如何使用这些字节来确定以下内容的逻辑:
Can someone please explain the logic behind how the processor uses these bytes to determine:
- 运行了什么指令
- 指令在哪些寄存器和/或地址上使用(如果有的话)
推荐答案
0x48
是REX前缀,W字段设置为1,表示64- bit 操作数大小.
(不是64字节).
0x48
is a REX prefix, with the W field set to 1, implying 64-bit operand size.
(not 64-byte).
许多用于指令即时版本的操作码(包括83
)将ModR/M字节中的3位/r
字段用作3个额外的操作码.英特尔的第2卷手册对此进行了记录,我认为附录中的操作码表中包括了它.
Many opcodes for immediate versions of instructions, including 83
, use the 3-bit /r
field in the ModR/M byte as 3 extra opcode bits. Intel's vol.2 manual documents this, and the opcode table in an appendix includes it, I think.
这就是为什么大多数原始的8086立即指令(例如and r/m, imm
)仍然只允许2个操作数的原因,与shrd eax, edx, 4
或imul edx, [rdi], 12345
不同的是,这两个ModRM字段都用于编码操作数,以及shrd eax, edx, 4
或imul edx, [rdi], 12345
所隐含的立即操作数操作码. SHRD/SHLD并添加了386,并且imul-immediate 已添加186 .不幸的是,复制与(and eax, edx, 0xf
)无法编码,但是至少x86可以使用LEA进行复制与添加/订阅.
This is why most original-8086 immediate instructions, like and r/m, imm
still only allow 2 operands, unlike shrd eax, edx, 4
or imul edx, [rdi], 12345
where both ModRM fields are used to encode operands, as well as the immediate operand implied by the opcode. SHRD/SHLD and were added with 386, and imul-immediate was added with 186. It's maybe unfortunate that copy-and-AND (and eax, edx, 0xf
) isn't encodeable, but at least x86 can use LEA for copy-and-add/sub.
每条指令的文档,例如 add
(vol2手册的html摘录),显示类似
REX.W + 83 /0 ib
代表ADD r/m64, imm8
,这就是您拥有的.
Each instruction's own docs, e.g. add
(html extract of the vol2 manual), shows encodings like
REX.W + 83 /0 ib
for ADD r/m64, imm8
, which is what you have.
7 0
+---+---+---+---+---+---+---+---+
| mod | reg | rm |
+---+---+---+---+---+---+---+---+
0xc4 = 0b11000100,因此reg字段=0.因此,我们的操作码为83 /0
,采用Intel的表示法.
0xc4 = 0b11000100, so the reg field = 0. Thus our opcode is 83 /0
, in Intel's notation.
其余的ModRM字段为:
The rest of the ModRM fields are:
- mode = 0b11,因此rm字段编码的是寄存器操作数,而不是寻址模式的基址寄存器.
- rm = 0b100.规则#4 = SPL/SP/ESP/RSP. (在本例中为RSP,因为它是64位操作数大小).请参阅英特尔手册,或 https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers用于表格.
- mode = 0b11, so the rm field encodes a register operand, not a base register for an addressing mode.
- rm = 0b100. reg #4 = SPL/SP/ESP/RSP. (In this case RSP because it's 64-bit operand-size). See Intel's manual, or https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers for tables.
所以指令是add rsp, 0x38
ndisasm -b64
同意:
$ cat > foo.asm
db 0x48, 0x83, 0xC4, 0x38
$ nasm foo.asm # create a flat binary with those bytes, not an object file
$ ndisasm -b64 foo
00000000 4883C438 add rsp,byte +0x38
这篇关于一个操作码字节如何根据“寄存器/操作码"解码为不同的指令.场地?那是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!