X86汇编-如何计算指令操作码的长度(以字节为单位) [英] X86 Assembly - How to calculate instruction opcodes length in bytes

查看:1227
本文介绍了X86汇编-如何计算指令操作码的长度(以字节为单位)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习X86 Assembly(用于学习逆向工程). 我学习了C#和C \ C ++语言以及相当的IL

I'm trying to learn X86 Assembly (for learn reverse engineering). I learned C# and C\C++ language and pretty of IL

我的主要问题可能是英语,因为我是波斯人,而且我也找不到任何有用的文档来学习用波斯语编写的X86汇编.所以我决定做我为学习C#和C ++所做的事情. 我尝试阅读X86示例和世界,但是我失败了,因为我不明白我必须选择哪个寄存器来获取指令以及仅通过查看源代码无法解决的其他问题.

Probably my main problem is English language because i'm Persian, and also i can't find any helpful document to learn X86 assembly that written in Persian. so what i decided to do what i did for learn C# and C++. I tried reading X86 samples and hello worlds but i failed because i can't understand which Register i have to choose for instructions and other problems that can't be solved by only look at source codes.

因此,我决定更改策略并进行挑战:编写X86反汇编程序 我很生气,我知道.但是我们不能说这是不可能的. 我首先需要了解(但不要记住)的这张表是: http://ref.x86asm .net/coder32.html

So i decided to change my strategy and do a challenge : Write a X86 Disassembler I'm mad, i know. But we can't say it's impossible. The first think that i need to understand (but no memorize) is this tables : http://ref.x86asm.net/coder32.html

我对操作码很满意,但是我不明白如何计算操作数的大小,或者寄存器的十六进制字节呢?

I'm good with opcodes but i can't understand how to calculate size of operands or what about registers hex bytes ?

对不起,我的英语不好.

Sorry for my bad English.

PS.我想用C#来做

PS. I want doing it using C#

推荐答案

因此,由于您似乎对此主题感兴趣,因此让我给您一个概述.一个x86指令最多包含五个部分,最长为15个字节:

So, since this topic seems to interest you, let me give you an overview. An x86 instruction comprises up to five parts and is up to 15 bytes long:

prefixes opcode operand displacement immediate

可以生成长度超过15个字节的编码,但是CPU会拒绝它们.除操作码外,所有五个部分都是可选的.您可以找到它们的长度,如下所示:

It is possible to generate encodings that are longer than 15 bytes, but the CPU rejects them. All five parts except for the opcode are optional. You can find their length as follows:

  • 一条指令可以具有任意数量的旧版前缀.它们是:f0 锁定f2 重复f3 重复2e cs 36 ss 3e ds 26 es 64 fs 65 gs 66操作数大小覆盖和67地址大小覆盖.但是,一次仅识别f0f2f3之一,并且仅识别262e363e6465之一.如果每个组提供一个以上的前缀,则CPU的行为会有所不同. VEX和EVEX编码的指令可能仅具有段覆盖和地址大小覆盖遗留前缀,因为其他前缀包含在VEX和EVEX前缀之下.
  • 在长模式(且仅在此模式下)中,一条指令可能在所有旧式前缀之后紧跟一个 REX前缀. REX前缀是404f之一.在其他模式下,这些字节是指令,而不是前缀,并且您的解码器必须对此加以说明.与传统前缀一样,VEX或EVEX编码的指令不能具有REX前缀.
  • 字节c4c5可以引入用于编码某些现代指令的 VEX前缀.在长模式下,它们始终会执行此操作,但在其他模式下,您必须随后检查该字节:将其解释为modr/m字节,如果它对r,r操作数对进行编码,则它是VEX前缀,否则为它的操作码. leslds.以c4开头的VEX前缀长为两个字节,而以c5开头的VEX前缀为三个字节. VEX前缀还对0f0f 380f 3a操作码前缀进行编码,这些前缀在VEX编码指令中被省略.请注意,通常,使用VEX前缀不是可选的.例如,pdep被编码为VEX.NDS.LZ.F2.0F38.W0 F5 /r(例如对于pdep eax,eax,eaxc4 e2 7b f5 c0),但是相应的传统指令f2 0f 38 f5 r/m32(例如对于pdep eax,eaxf2 0f 38 f5 c0)无效.注意,相同的操作码可以带有VEX前缀,也可以不带VEX前缀,并且两者可以表示不同的含义.例如,0f 77emms,而VEX.128.0F.WIG 77(即c5 f8 77)是vzeroupper.
  • 字节62引入了 EVEX前缀,该前缀用于编码AVX512指令.与VEX前缀相似,需要检查接下来的几个字节,以区分EVEX前缀和bound指令. EVEX前缀始终为四个字节,并像VEX前缀一样对部分操作码进行编码.
  • an instruction can have any number of legacy prefixes. These are: f0 lock, f2 repne, f3 repe, 2e cs, 36 ss, 3e ds, 26 es, 64 fs, 65 gs, 66 operand size override, and 67 address size override. However, only one of f0, f2, f3 and only one of 26, 2e, 36, 3e, 64, and 65 is recognized at a time. If more than one prefix from each group is provided, CPUs behave differently. VEX and EVEX encoded instructions may only have the segment override and address size override legacy prefixes as the other prefixes are subsumed under the VEX and EVEX prefixes.
  • In long mode (and only there), an instruction may have a REX prefix immediately after all legacy prefixes. The REX prefix is one of 40 to 4f. In other modes, these bytes are instructions, not prefixes and your decoder must account for that. As with legacy prefixes, a VEX or EVEX encoded instruction cannot have a REX prefix.
  • The bytes c4 and c5 can introduce a VEX prefix used to encode some modern instructions. In long mode, they always do, but in other modes, you have to check the byte afterwards: Interprete it as a modr/m byte, if it encodes an r,r operand pair, it's a VEX prefix, otherwise its the opcode for les or lds. A VEX prefix beginning with c4 is two bytes long, with c5 it's three bytes. The VEX prefix also encodes the 0f, 0f 38 and 0f 3a opcode prefixes which are omitted in a VEX encoded instruction. Note that generally, using a VEX prefix is not optional. For example, pdep is encoded as VEX.NDS.LZ.F2.0F38.W0 F5 /r (e.g. c4 e2 7b f5 c0 for pdep eax,eax,eax) but the corresponding legacy instruction f2 0f 38 f5 r/m32 (e.g. f2 0f 38 f5 c0 for pdep eax,eax) is invalid. Note that the same opcode can exist with a VEX prefix and without and the two can mean different things. For example, 0f 77 is emms but VEX.128.0F.WIG 77 (i.e. c5 f8 77) is vzeroupper.
  • The byte 62 introduces an EVEX prefix which is used to encode AVX512 instructions. Similar to the VEX prefix, the next few bytes need to be checked to distinguish an EVEX prefix from the bound instruction. The EVEX prefix is always four bytes long and encodes part of the opcode just as the VEX prefix does.

在前缀之后,是操作码.最初,操作码始终是一个字节,但随后空间不足,因此现在是一个字节或以0f0f 380f 3a为前缀的单个字节.如果指令是VEX编码的,则这些前缀不存在.请注意,某些前缀可能会更改编码的指令.例如,操作码0f b8jmpe(进入IA-64模式),但是f3 0f b8不是repe jmpe而是popcnt.

After the prefixes, the opcode follows. Originally, the opcode was always a single byte but then they ran out of space, so now it's either a single byte or a single byte prefixed by 0f, 0f 38, or 0f 3a. These prefixes are absent if the instruction is VEX encoded. Note that some prefixes may change what instruction is encoded. For example, opcode 0f b8 is jmpe (Enter IA-64 mode) but f3 0f b8 is not repe jmpe but rather popcnt.

操作码和前缀决定对哪条指令进行编码.从这里开始,基本上是一路顺风.根据指令,可能会跟随一个 modr/m 字节.根据modr/m字节和地址覆盖前缀,可能会跟随一个 sib字节和一个,两个或四个位移字节.最后,根据指令,操作数大小覆盖前缀和REX前缀可能紧随其后是1、2、4、6或8个立即字节.

The opcode and the prefixes decide which instruction is encoded. From here on, it's mostly smooth sailing. Depending on the instruction, a modr/m byte may follow. Depending on the modr/m byte and the address override prefix, a sib byte and one, two, or four displacement bytes may follow. Finally, depending on the instruction, the operand size override prefix, and the REX prefix, one, two, four, six, or eight immediate bytes may follow.

这是我在Stack Overflow答案范围内所能提供的尽可能多的描述.所以 TL; DR:这真的很复杂.

That's about as much of a description as I can give in the scope of a Stack Overflow answer. So TL;DR: It's really complicated.

这篇关于X86汇编-如何计算指令操作码的长度(以字节为单位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆