如何编写反汇编程序? [英] How to write a disassembler?

查看:96
本文介绍了如何编写反汇编程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣编写一个x86分解器作为一个教育项目。



我发现的唯一真正资源是螺旋空间, 如何编写反汇编程序。尽管这很好地描述了反汇编程序的各个组件,但我对一些更详细的资源感兴趣。我还快速浏览了 NASM的源代码,但这是一个值得借鉴的重要内容。 / p>

我意识到这个项目的主要挑战之一是我必须处理的相当大的x86指令集。我对基本结构,基本反汇编程序链接等也很感兴趣。



有人能指出我编写x86汇编程序的详细资源吗?

解决方案

看看 rel = noreferrer > 80386程序员参考手册。反汇编程序实际上只是美化的有限状态机。拆卸步骤如下:


  1. 检查当前字节是否为指令前缀字节( F3 F2 F0 );如果是这样,那么您有一个 REP / REPE / REPNE / LOCK 前缀。前进到下一个字节。

  2. 检查当前字节是否为地址大小的字节( 67 )。如果是这样,如果当前处于32位模式,则以16位模式解码指令的其余部分中的地址,如果当前处于16位模式,则以32位模式解码地址。

  3. 检查当前字节是否为操作数大小的字节( 66 )。如果是这样,如果当前处于32位模式,则以16位模式解码立即操作数;如果当前处于16位模式,则以32位模式解码立即操作数

  4. 检查是否当前字节是段覆盖字节( 2E 36 3E 26 64 65 ) 。如果是这样,请使用相应的段寄存器而不是默认段寄存器来解码地址。

  5. 下一个字节是操作码。如果操作码为 0F ,则它是扩展操作码,并读取下一个字节作为扩展操作码。

  6. 取决于特定操作码,读入并解码Mod R / M字节,比例索引基(SIB)字节,位移(0、1、2或4个字节)和/或立即数(0、1、2,或4个字节)。这些字段的大小取决于操作码,地址大小覆盖和先前解码的操作数大小覆盖。

操作码告诉您操作正在执行。可以从Mod R / M,SIB,位移和立即值的值解码操作码的参数。由于x86的复杂性,存在很多可能性和很多特殊情况。请参阅上面的链接以获得更详尽的说明。


I'm interested in writing an x86 dissembler as an educational project.

The only real resource I have found is Spiral Space's, "How to write a disassembler". While this gives a nice high level description of the various components of a disassembler, I'm interested in some more detailed resources. I've also taken a quick look at NASM's source code but this is somewhat of a heavyweight to learn from.

I realize one of the major challenges of this project is the rather large x86 instruction set I'm going to have to handle. I'm also interested in basic structure, basic disassembler links, etc.

Can anyone point me to any detailed resources on writing a x86 disassembler?

解决方案

Take a look at section 17.2 of the 80386 Programmer's Reference Manual. A disassembler is really just a glorified finite-state machine. The steps in disassembly are:

  1. Check if the current byte is an instruction prefix byte (F3, F2, or F0); if so, then you've got a REP/REPE/REPNE/LOCK prefix. Advance to the next byte.
  2. Check to see if the current byte is an address size byte (67). If so, decode addresses in the rest of the instruction in 16-bit mode if currently in 32-bit mode, or decode addresses in 32-bit mode if currently in 16-bit mode
  3. Check to see if the current byte is an operand size byte (66). If so, decode immediate operands in 16-bit mode if currently in 32-bit mode, or decode immediate operands in 32-bit mode if currently in 16-bit mode
  4. Check to see if the current byte is a segment override byte (2E, 36, 3E, 26, 64, or 65). If so, use the corresponding segment register for decoding addresses instead of the default segment register.
  5. The next byte is the opcode. If the opcode is 0F, then it is an extended opcode, and read the next byte as the extended opcode.
  6. Depending on the particular opcode, read in and decode a Mod R/M byte, a Scale Index Base (SIB) byte, a displacement (0, 1, 2, or 4 bytes), and/or an immediate value (0, 1, 2, or 4 bytes). The sizes of these fields depend on the opcode , address size override, and operand size overrides previously decoded.

The opcode tells you the operation being performed. The arguments of the opcode can be decoded form the values of the Mod R/M, SIB, displacement, and immediate value. There are a lot of possibilities and a lot of special cases, due to the complex nature of x86. See the links above for a more thorough explanation.

这篇关于如何编写反汇编程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆