一个操作码字节如何根据“寄存器/操作码"解码为不同的指令.场地?那是什么? [英] How does one opcode byte decode to different instructions depending on the "register/opcode" field? What is that?

查看:212
本文介绍了一个操作码字节如何根据“寄存器/操作码"解码为不同的指令.场地?那是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何确定将字节数组转换为机器代码的内容?

How can I determine what an array of bytes will translate into in machine code?

我知道,如果我在开始时看到0f是2字节指令,但是我在x64调试器中看到了其他前缀和一些反汇编,则看到了奇怪的交互,例如48 83 C4 38,并且可以在操作码参考上看到48说操作数是64个字节.

I understand that if I see 0f at the start it's a 2 byte instruction, but I see other prefixes and in some disassembly in my x64 debugger I see weird interactions like 48 83 C4 38 and I can see on the opcode reference that 48 says the operand is 64 bytes.

但是83说它可以是7个不同的指令,具体取决于称为寄存器/操作码字段"的字段.什么?

But 83 says it can be 7 different instructions depending on a field called "register/opcode field" ..what?

有人可以解释处理器如何使用这些字节来确定以下内容的逻辑:

Can someone please explain the logic behind how the processor uses these bytes to determine:

  1. 运行了什么指令
  2. 指令在哪些寄存器和/或地址上使用(如果有的话)

推荐答案

0x48是REX前缀,W字段设置为1,表示64- bit 操作数大小. (不是64字节).

0x48 is a REX prefix, with the W field set to 1, implying 64-bit operand size. (not 64-byte).

许多用于指令即时版本的操作码(包括83)将ModR/M字节中的3位/r字段用作3个额外的操作码.英特尔的第2卷手册对此进行了记录,我认为附录中的操作码表中包括了它.

Many opcodes for immediate versions of instructions, including 83, use the 3-bit /r field in the ModR/M byte as 3 extra opcode bits. Intel's vol.2 manual documents this, and the opcode table in an appendix includes it, I think.

这就是为什么大多数原始的8086立即指令(例如and r/m, imm)仍然只允许2个操作数的原因,与shrd eax, edx, 4imul edx, [rdi], 12345不同的是,这两个ModRM字段都用于编码操作数,以及shrd eax, edx, 4imul edx, [rdi], 12345所隐含的立即操作数操作码. SHRD/SHLD并添加了386,并且imul-immediate 已添加186 .不幸的是,复制与(and eax, edx, 0xf)无法编码,但是至少x86可以使用LEA进行复制与添加/订阅.

This is why most original-8086 immediate instructions, like and r/m, imm still only allow 2 operands, unlike shrd eax, edx, 4 or imul edx, [rdi], 12345 where both ModRM fields are used to encode operands, as well as the immediate operand implied by the opcode. SHRD/SHLD and were added with 386, and imul-immediate was added with 186. It's maybe unfortunate that copy-and-AND (and eax, edx, 0xf) isn't encodeable, but at least x86 can use LEA for copy-and-add/sub.

每条指令的文档,例如 add(vol2手册的html摘录),显示类似
REX.W + 83 /0 ib代表ADD r/m64, imm8,这就是您拥有的.

Each instruction's own docs, e.g. add (html extract of the vol2 manual), shows encodings like
REX.W + 83 /0 ib for ADD r/m64, imm8, which is what you have.

来自wiki.osdev.org的ModRM位字段图

  7                           0
+---+---+---+---+---+---+---+---+
|  mod  |    reg    |     rm    |
+---+---+---+---+---+---+---+---+

0xc4 = 0b11000100,因此reg字段=0.因此,我们的操作码为83 /0,采用Intel的表示法.

0xc4 = 0b11000100, so the reg field = 0. Thus our opcode is 83 /0, in Intel's notation.

其余的ModRM字段为:

The rest of the ModRM fields are:

  • mode = 0b11, so the rm field encodes a register operand, not a base register for an addressing mode.
  • rm = 0b100. reg #4 = SPL/SP/ESP/RSP. (In this case RSP because it's 64-bit operand-size). See Intel's manual, or https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers for tables.

所以指令是add rsp, 0x38

ndisasm -b64同意:

$ cat > foo.asm
db 0x48, 0x83, 0xC4, 0x38
$ nasm foo.asm     # create a flat binary with those bytes, not an object file
$ ndisasm -b64 foo
00000000  4883C438          add rsp,byte +0x38

这篇关于一个操作码字节如何根据“寄存器/操作码"解码为不同的指令.场地?那是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆