X86汇编-如何计算指令操作码的长度(以字节为单位) [英] X86 Assembly - How to calculate instruction opcodes length in bytes
问题描述
我正在尝试学习X86 Assembly(用于学习逆向工程). 我学习了C#和C \ C ++语言以及相当的IL
I'm trying to learn X86 Assembly (for learn reverse engineering). I learned C# and C\C++ language and pretty of IL
我的主要问题可能是英语,因为我是波斯人,而且我也找不到任何有用的文档来学习用波斯语编写的X86汇编.所以我决定做我为学习C#和C ++所做的事情. 我尝试阅读X86示例和世界,但是我失败了,因为我不明白我必须选择哪个寄存器来获取指令以及仅通过查看源代码无法解决的其他问题.
Probably my main problem is English language because i'm Persian, and also i can't find any helpful document to learn X86 assembly that written in Persian. so what i decided to do what i did for learn C# and C++. I tried reading X86 samples and hello worlds but i failed because i can't understand which Register i have to choose for instructions and other problems that can't be solved by only look at source codes.
因此,我决定更改策略并进行挑战:编写X86反汇编程序 我很生气,我知道.但是我们不能说这是不可能的. 我首先需要了解(但不要记住)的这张表是: http://ref.x86asm .net/coder32.html
So i decided to change my strategy and do a challenge : Write a X86 Disassembler I'm mad, i know. But we can't say it's impossible. The first think that i need to understand (but no memorize) is this tables : http://ref.x86asm.net/coder32.html
我对操作码很满意,但是我不明白如何计算操作数的大小,或者寄存器的十六进制字节呢?
I'm good with opcodes but i can't understand how to calculate size of operands or what about registers hex bytes ?
对不起,我的英语不好.
Sorry for my bad English.
PS.我想用C#来做
PS. I want doing it using C#
推荐答案
因此,由于您似乎对此主题感兴趣,因此让我给您一个概述.一个x86指令最多包含五个部分,最长为15个字节:
So, since this topic seems to interest you, let me give you an overview. An x86 instruction comprises up to five parts and is up to 15 bytes long:
prefixes opcode operand displacement immediate
可以生成长度超过15个字节的编码,但是CPU会拒绝它们.除操作码外,所有五个部分都是可选的.您可以找到它们的长度,如下所示:
It is possible to generate encodings that are longer than 15 bytes, but the CPU rejects them. All five parts except for the opcode are optional. You can find their length as follows:
- 一条指令可以具有任意数量的旧版前缀.它们是:
f0
锁定,f2
重复,f3
重复,2e
cs ,36
ss ,3e
ds ,26
es ,64
fs ,65
gs ,66
操作数大小覆盖和67
地址大小覆盖.但是,一次仅识别f0
,f2
,f3
之一,并且仅识别26
,2e
,36
,3e
,64
和65
之一.如果每个组提供一个以上的前缀,则CPU的行为会有所不同. VEX和EVEX编码的指令可能仅具有段覆盖和地址大小覆盖遗留前缀,因为其他前缀包含在VEX和EVEX前缀之下. - 在长模式(且仅在此模式下)中,一条指令可能在所有旧式前缀之后紧跟一个 REX前缀. REX前缀是
40
至4f
之一.在其他模式下,这些字节是指令,而不是前缀,并且您的解码器必须对此加以说明.与传统前缀一样,VEX或EVEX编码的指令不能具有REX前缀. - 字节
c4
和c5
可以引入用于编码某些现代指令的 VEX前缀.在长模式下,它们始终会执行此操作,但在其他模式下,您必须随后检查该字节:将其解释为modr/m字节,如果它对r,r
操作数对进行编码,则它是VEX前缀,否则为它的操作码.les
或lds
.以c4
开头的VEX前缀长为两个字节,而以c5
开头的VEX前缀为三个字节. VEX前缀还对0f
,0f 38
和0f 3a
操作码前缀进行编码,这些前缀在VEX编码指令中被省略.请注意,通常,使用VEX前缀不是可选的.例如,pdep
被编码为VEX.NDS.LZ.F2.0F38.W0 F5 /r
(例如对于pdep eax,eax,eax
的c4 e2 7b f5 c0
),但是相应的传统指令f2 0f 38 f5 r/m32
(例如对于pdep eax,eax
的f2 0f 38 f5 c0
)无效.注意,相同的操作码可以带有VEX前缀,也可以不带VEX前缀,并且两者可以表示不同的含义.例如,0f 77
是emms
,而VEX.128.0F.WIG 77
(即c5 f8 77
)是vzeroupper
. - 字节
62
引入了 EVEX前缀,该前缀用于编码AVX512指令.与VEX前缀相似,需要检查接下来的几个字节,以区分EVEX前缀和bound
指令. EVEX前缀始终为四个字节,并像VEX前缀一样对部分操作码进行编码.
- an instruction can have any number of legacy prefixes. These are:
f0
lock,f2
repne,f3
repe,2e
cs,36
ss,3e
ds,26
es,64
fs,65
gs,66
operand size override, and67
address size override. However, only one off0
,f2
,f3
and only one of26
,2e
,36
,3e
,64
, and65
is recognized at a time. If more than one prefix from each group is provided, CPUs behave differently. VEX and EVEX encoded instructions may only have the segment override and address size override legacy prefixes as the other prefixes are subsumed under the VEX and EVEX prefixes. - In long mode (and only there), an instruction may have a REX prefix immediately after all legacy prefixes. The REX prefix is one of
40
to4f
. In other modes, these bytes are instructions, not prefixes and your decoder must account for that. As with legacy prefixes, a VEX or EVEX encoded instruction cannot have a REX prefix. - The bytes
c4
andc5
can introduce a VEX prefix used to encode some modern instructions. In long mode, they always do, but in other modes, you have to check the byte afterwards: Interprete it as a modr/m byte, if it encodes anr,r
operand pair, it's a VEX prefix, otherwise its the opcode forles
orlds
. A VEX prefix beginning withc4
is two bytes long, withc5
it's three bytes. The VEX prefix also encodes the0f
,0f 38
and0f 3a
opcode prefixes which are omitted in a VEX encoded instruction. Note that generally, using a VEX prefix is not optional. For example,pdep
is encoded asVEX.NDS.LZ.F2.0F38.W0 F5 /r
(e.g.c4 e2 7b f5 c0
forpdep eax,eax,eax
) but the corresponding legacy instructionf2 0f 38 f5 r/m32
(e.g.f2 0f 38 f5 c0
forpdep eax,eax
) is invalid. Note that the same opcode can exist with a VEX prefix and without and the two can mean different things. For example,0f 77
isemms
butVEX.128.0F.WIG 77
(i.e.c5 f8 77
) isvzeroupper
. - The byte
62
introduces an EVEX prefix which is used to encode AVX512 instructions. Similar to the VEX prefix, the next few bytes need to be checked to distinguish an EVEX prefix from thebound
instruction. The EVEX prefix is always four bytes long and encodes part of the opcode just as the VEX prefix does.
在前缀之后,是操作码.最初,操作码始终是一个字节,但随后空间不足,因此现在是一个字节或以0f
,0f 38
或0f 3a
为前缀的单个字节.如果指令是VEX编码的,则这些前缀不存在.请注意,某些前缀可能会更改编码的指令.例如,操作码0f b8
是jmpe
(进入IA-64模式),但是f3 0f b8
不是repe jmpe
而是popcnt
.
After the prefixes, the opcode follows. Originally, the opcode was always a single byte but then they ran out of space, so now it's either a single byte or a single byte prefixed by 0f
, 0f 38
, or 0f 3a
. These prefixes are absent if the instruction is VEX encoded. Note that some prefixes may change what instruction is encoded. For example, opcode 0f b8
is jmpe
(Enter IA-64 mode) but f3 0f b8
is not repe jmpe
but rather popcnt
.
操作码和前缀决定对哪条指令进行编码.从这里开始,基本上是一路顺风.根据指令,可能会跟随一个 modr/m 字节.根据modr/m字节和地址覆盖前缀,可能会跟随一个 sib字节和一个,两个或四个位移字节.最后,根据指令,操作数大小覆盖前缀和REX前缀可能紧随其后是1、2、4、6或8个立即字节.
The opcode and the prefixes decide which instruction is encoded. From here on, it's mostly smooth sailing. Depending on the instruction, a modr/m byte may follow. Depending on the modr/m byte and the address override prefix, a sib byte and one, two, or four displacement bytes may follow. Finally, depending on the instruction, the operand size override prefix, and the REX prefix, one, two, four, six, or eight immediate bytes may follow.
这是我在Stack Overflow答案范围内所能提供的尽可能多的描述.所以 TL; DR:这真的很复杂.
That's about as much of a description as I can give in the scope of a Stack Overflow answer. So TL;DR: It's really complicated.
这篇关于X86汇编-如何计算指令操作码的长度(以字节为单位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!