x86反编译资源 [英] Resources for x86 decompilation

查看:103
本文介绍了x86反编译资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对表示和运行程序的低级流程有深入的了解.我决定通过编写一个程序来解析和显示目标文件信息(标头,节等)来做到这一点.我几乎已经完成了这一部分.自然的扩展是将剩余的相关数据反编译为汇编指令.最初,我将专注于x86.

I'd like to get a solid understanding of the low level process for representing and running a program. I've decided to do this by writing a program to parse and display object file information (headers, sections, etc.). I've nearly finished this part. A natural extension is to decompile the remaining relevant data into assembly instructions. Initially, I'll focus on x86.

在哪里可以找到与此反编译相关的资源(二进制-> ASM)?我读过x86与ASM一一对应,尽管我不知道提取转换表的最佳参考.

Where can I find resources related to this decompilation (binary -> ASM)? I've read that x86 has a one to one correspondence to ASM, although I do not know the best reference from which to pull the translation tables.

此外,在我关注它的同时,我会对跟踪任何提供的调试信息感兴趣.

Also, while I'm at it, I'd be interested in tracking any supplied debugging information. Are there any references on the format used for this information (lets assume ELF and GCC with -g option)?

您有任何一般建议吗?这里的目标是动手实践项目,以加深我的理解.

Do any of you have any general advice? The goal here is a hands-on project to increase my understanding.

推荐答案

x86的指令长度可变,这意味着很难拆卸.如果这是您的第一个反汇编程序,则不建议使用.

x86 is variable instruction length, which means very difficult to disassemble. Not advisable if this is your first disassembler.

说...我要采取的方法是,您必须在二进制文件中标识操作码第一个字节的字节,并将其与操作码或数据中第二个或其他字节的字节分开.一旦知道了,您就可以从二进制文件的开头开始并反汇编操作码.

Saying that...the approach I take is that you have to identify in the binary the bytes that are the first byte of an opcode and separate those from bytes that are second or other bytes in the opcode or data. Once you know that you can start at the beginning of the binary and disassemble the opcodes.

如何从其他字节中找出操作码?您需要遍历所有可能的执行路径,这听起来像是递归问题,并且可能但不一定必须这样做.查看中断向量表和/或代码中的所有硬件入口点.这为您提供了操作码字节的简短列表.一种非递归方法是对二进制文件进行多次遍历,查看每个标记为操作码的字节,对其进行解码就足以知道它消耗了多少字节.您还需要知道它是否是无条件分支,条件分支,返回,调用等.如果它不是无条件分支或返回,则可以假定此指令之后的字节为下一条指令的第一个字节.每当您遇到某种分支或调用时,请计算目标地址,然后将该字节添加到列表中.继续进行传递,直到传递不向列表中添加新字节的传递为止.您还需要确保,如果说找到的字节是一个3字节的指令,但是该字节之后的字节被标记为指令,那么您就遇到了问题.诸如条件分支之类的事物以确保它们永远不会分支的事物为先.如果根本没有将高级代码编译成二进制文件,那么您看不到太多,但是手写汇编器的美好时光,或者想要保护其代码的人们会做这样的事情.

How do yo figure out opcodes from other bytes? You need to walk all possible execution paths, sounds like a recursion problem, and could be but doesnt have to be. Look at the interrupt vector table and/or all hardware entry points in to the code. That gives you a short list of opcode bytes. A non-recursion approach is to make many passes over the binary looking at each byte that is marked an opcode, decode it just enough to know how many bytes it consumes. You also need to know if it is an unconditional branch, conditional branch, return, call, etc. If it is not an unconditional branch or return you can assume the byte after this instruction is the first byte of the next instruction. Any time you encounter a branch or call of some sort, compute the destination address, add that byte to the list. Keep making passes until you have made a pass that adds no new bytes to the list. You also need to make sure that if say you find a byte that is a 3 byte instruction, but the byte after it is marked as an instruction, then you have a problem. Things like conditional branches that are preceeded by something that insures they will never branch. You dont see this much if at all with high level code compiled to a binary, but the good old days of hand written assembler, or folks that want to protect their code will do things like this.

不幸的是,如果您只有二进制文件,那么对于可变长度的指令集,您将无法获得完美的反汇编.某些分支目标是在运行时计算的,有时手工编码的程序集会在进行返回以更改下一步执行的代码之前修改堆栈,如果这是该代码的唯一路径,那么除非您走得那么远,否则您可能不会以编程方式弄清楚它的位置.模拟代码.而且即使进行仿真,您也不会覆盖所有执行路径.

Unfortunately if all you have is the binary, for a variable length instruciton set, you wont get a perfect disassembly. Some branch destinations are computed at runtime, sometimes hand coded assembly will modify the stack before doing a return to change what code executes next, if that is the only path to that code then you likely wont figure it out programmatically unless you go so far as to simulate the code. And even with simulation you wont cover all execution paths.

例如,使用固定长度的指令集(例如ARM)(只要它是arm而不是arm和thumb的混合),您可以简单地从二进制文件的开头开始并反汇编,直到用完所有单词为止.您可以将数据字分解为有效或无效或不太可能使用的指令,但这很好.

With a fixed length instruction set like an ARM for example (so long as it is arm and not a mixture of arm and thumb) you can simply start at the beginning of the binary and disassemble until you run out of words. You might disassemble a data word into a valid or invalid or unlikely to be used instruction, but that is fine.

如果在小精灵的某个地方有什么东西指示二进制文件的哪些部分是可执行的,什么部分是数据,我不会感到惊讶.甚至可能您不必走数据路径,我怀疑objdump会执行类似的任务,因为它可能使用了elf文件中的某些内容.

I wouldnt be surprised if somewhere in the elf there is something that indicates what parts of the binary are executable and what parts are data. maybe even so much that you dont have to walk the data paths, I doubt objdump performs a task like that it probably uses something in the elf file.

在许多地方都记录了elf文件格式.这是基本结构,供应商可以添加特定的块类型,这些类型将由供应商进行记录.

The elf file format is documented in many places. There is the basic structure and vendors may add specific block types which would be documented by the vendor.

这篇关于x86反编译资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆