在PE文件中搜索操作码字节 [英] Searching for opcode bytes in PE file

查看:55
本文介绍了在PE文件中搜索操作码字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我的任务是在PE文件中搜索操作码字节并检查指定的操作码字节序列(常量和预定义)是否存在于PE文件。我在网上遇到过很多例子,但解决方案大多是C#或Python;但是,我的要求基于C语言。



请告诉我如何通过在C中编写一个简单的程序来检查和比较PE文件中的操作码字节值。任何帮助将不胜感激。



谢谢。

Hi,

I have a task of searching for opcode bytes in a PE file and checking whether a specified opcode byte sequence (constant and predefined) is present in the PE file. I have come across numerous examples online, but the solutions are mostly in C# or Python; however, my requirements are based in C language.

Please tell me how can I check and compare opcode byte values in a PE file by writing a simple program in C. Any help will be greatly appreciated.

Thanks.

推荐答案

为了继续你必须了解有3种不同的偏移类型与PE文件一起使用:

1. exe文件中的物理原始偏移量

2.相对虚拟地址(RVA)

3.虚拟地址(VA)



我认为#1是显而易见的。它实际上只是文件中的偏移量。 虚拟地址用于在加载PE后处理内存中的内容。这里提到基地址非常重要:这是进程内存空间内的一个地址(一个指针),它指向你的PE文件加载时内存区域的第一个字节。请注意,对于每个程序启动,此基本地址可能不同,因为DLL文件通常(但不一定!)可重定位。当您调用LoadLibrary()以加载DLL并成功加载文件时,返回值是一个HMODULE / HINSTANCE句柄,用于存储基址的值。如果你检查这个位置的内存中的字节,你确实会在这里找到DOS / PE头的字节! VA和RVA之间的关系,指向PE文件的相同字节:VA ==(BaseAddress + RVA)。如果你只是检查PE文件而没有加载,那么VA偏移不是你的兴趣,实际上PE文件不包含任何VA,因为你只有在将PE加载到特定地址后才知道VA。 PE文件仅包含文件偏移和RVAs - 主要是RVAs,因为文件偏移仅在加载PE之前有用,并且在加载之后使用所有其他偏移(例如,如果DLL无法加载到首选基址,则重新定位DLL) )。为什么文件偏移和RVAs不同? PE文件中的部分在加载后在PE文件和内存中都有对齐,但这些alignemnt值通常不同。在PE文件中,部分的开头通常与512字节或4K边界(磁盘扇区大小)对齐以允许有效加载,并且在存储器中将相同的部分放置到通常与4K边界对齐的起始地址(存储器页面大小) 。通常4K用于文件和内存对齐,以使加载更容易。让我们说链接器决定在PE文件中使用512byte对齐,在内存中使用4K。在这种情况下,当您从文件加载部分时,可能会发生文件内部之间的间隙为零,但加载后间隙会增加,因为加载程序必须满足4K内存的对齐。示例:您的编译器编译一个包含几个字节代码的hello world程序,比方说100个字节和几个字节的数据:Hello World!。在此之后,链接器将编译器的输出组合成可执行文件。请注意,这可以通过几种(无限)方式完成,但我描述了标准的常用方法。链接器可能会将数据和代码放入不同的部分。请注意,PE文件的标题也是一个部分,它始终位于文件偏移零和RVA零,但它没有自己的节标题。不过,我总是将它视为减去第一部分:-)。这非常重要,因为它意味着包含代码/数据的第一个真实部分不能驻留在文件/内存中的零偏移上!让我们说你的链接器做得很好,并为你的PE组装一个小标题(~384字节)。在这种情况下,由于512字节文件对齐和4K内存对齐,下一部分必须放在文件偏移512和RVA 4096。这意味着在加载文件后,PE的标题与文件和内存中的第一部分之间会有一些差距,但加载前后的间隙大小不同!这就是使用文件偏移和相对虚拟地址(RVAs)的重要原因。当然链接器可以决定将你的第一部分放到更高的偏移量,但它通常不会这样做。您的第一部分的RVA可以是存储器对齐的倍数(k * 4096,其中k是整数且k> 0)。我提到这个是因为我可以使用十六进制编辑器伪造你非常奇怪的PE文件,如果它只处理一些流行链接器的输出,你的程序就无法分析。因此,在我们的hello world程序中,一个可能的输出可以是:

标题(file_offset = 0,RVA = 0)

代码(file_offset = 512,RVA = 4096)

数据(file_offset = 1024,RVA = 8192)

为什么我们需要将整个东西分成几个部分???因为每个部分的内存保护标志可以不同。编译器可以决定在您的代码部分和noexec / readonly上放置exec和readonly标志,也可以在数据部分上放置noexec / readwrite标志。只能在x86平台上按4K页面设置内存保护。所有部分(标题部分除外)在PE头中都有一个IMAGE_SECTION_HEADER条目。此IMAGE_SECTION_HEADER包含有关节的信息:名称,文件偏移量,RVA,特性(定义内存保护的标志,请注意许多特征标志映射到相同的内存保护标志)。请注意,该部分的名称完全没用,部分名称主要由链接器使用,并被加载的Windows PE忽略。例如,MS链接器使用.text作为代码段的名称,但如果我伪造一个PE并将.text作为名称添加到我的数据部分该怎么办? (顺便说一句:PE antidebug使节名称为零...)请注意,标题部分没有相应的IMAGE_SECTION_HEADER,因此它的标志由windows设置。在大多数Windows程序中,标题部分也是可执行的(这是一个安全漏洞),例如CIH病毒通过将自身置于标题部分和文件中第一部分之间的间隙来感染文件。这是可能的,因为大多数链接器使用4K作为文件对齐,并且标头通常远小于4K,因此在标头之后文件中存在相当大的间隙,该标头将与标头一起加载并获得与标题,如果你调整标题中的SizeOfHeaders字段!



作为最后一步,让事情更清楚一点,所以我给你的脑子转储我如何percieve这些一般的事情。 PE文件包含它们之间有小间隙的部分(包括PE头作为特殊的第一部分)。在加载PE之后,由于四舍五入到内存页面边界,这些间隙可能会增加。基本上,您将使用的唯一文件偏移量可能是IMAGE_SECTION_HEADER.PointerToRawData字段,这些字段在文件中偏移到部分的开头。加载文件后,偏移量无用,因此指向某些内容的大多数其他标题字段都表示为RVAs。一个很好的例子是例如IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint,它指定程序执行的开始位置。您可以使用节标题在文件偏移和RVAs之间进行转换,就像我在示例程序中所做的那样。请注意,任何具有正确内存保护标志(可执行文件)的部分都可以包含代码(包括某些Windows版本上的标头)!当然,您可以通过编程方式更改4K内存页的保护标志,或者您可以将数据部分的某些部分复制到可执行内存区域或其他任何内容......当然,如果我们说的是由可靠的链接器组成的可执行文件,而不是可执行代码, readonly和读/写数据是完全分开的,但如果你想为任何事情做好准备,你必须考虑到极端的事情,例如:入口点指向标题部分! (就像CIH病毒的情况一样!)加载后各部分之间的间隙很大(我的手动添加部分可以驻留在文件偏移1024上,而其RVA为0x10000或更大,在此内容和上一部分之间留下了很大的内存空白) 。不幸的是,你无法准确地告诉哪些字节是可执行的,哪些字节不可执行。如果您信任链接器和内存标志,那么您可以告诉...好的exe审查员(如IDA pro)只需启动IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint并检查所有可执行字节和所有可能的控制流,并在每个可能可达的字节上反汇编。即使在这种情况下,您也不会找到所有可执行字节。假设我在eax寄存器中计算了一个地址,然后我说jmp eax。即使地址始终相同,因为我只需添加两个常数,您需要在prog中内置一个非常好的静态分析器来找出跳转目标。我可以对它进行一些小的改动,例如通过读取配置文件中的int值,该值可以将0或1作为值并将其计算到我的跳转目标地址中,在这种情况下,您没有机会找到目标。在IDA pro中,您可以手动向IDA添加提示,您可以说Hey IDA,在这里检查这些字节为代码。将这些字节检查为5个XYZ结构的数组。。



我刚想到内存中某个部分的大小可能比文件中的大小大。例如,一些链接器将初始化的读/写数据放入一个部分(假设您有4096个字节初始化的全局数据),但链接器可以将8192指定为IMAGE_SECTION_HEADER.VirtualSize,这意味着该文件将仅包含4K初始化数据,但之后加载8K内存分配。对于编译器来说,这是一个很好的技巧,可以为你的未初始化的全局变量分配空间,而不会在你的PE文件中浪费空间!



我从来没有读过这些东西,但是我真正相信这些是PE文件黑客攻击的基础,大多数人从这个话题开始都可以慢慢开始/进展,因为他们并不清楚这些。我还要提到你已经开始编写一个可能远远超出你现有知识的程序,因此你要么必须阅读并理解数百页关于你需要的相关技巧,要么放弃编写下一个IDA Pro的想法。另一个重要的事情是你的问题不具体,涉及很多困难的话题,这就是为什么没人会真正回答它。我的长篇评论正在形成该主题的介绍(一开始),因此他们可以帮助您决定是否投入大量时间。如果您不知道,请务必查看IDA Pro。该工具有免费版本(5.x如果我没记错的话)。
In order to proceed you have to understand that there are 3 different offset types used in conjunction with PE files:
1. "physical" raw offset inside the exe file
2. Relative Virtual Address (RVA)
3. Virtual Address (VA)

I think #1 is obvious. Its really just an offset inside the file. "Virtual" addresses are used to address things in memory after loading the PE. Its very important to mention the "Base Address" here: this is an address (a pointer) inside the memory space of your process and it points to the first byte of the memory area were your PE file was loaded. Note that this Base Address can be different for each program startup as DLL files are usually (but not necessarily!) relocatable. When you call LoadLibrary() to load a DLL and it loads the file successfully the return value is a HMODULE/HINSTANCE handle that stores the value of the Base Address. If you examine the bytes in the memory at this location you will indeed find the bytes of your DOS/PE header here!!! The relation between a VA and an RVA that point to the same bytes of your PE file: VA == (BaseAddress + RVA). If you are just examining PE files without loading than VA offsets are not your interest, and indeed PE files do not contain any VAs because you know the VA only after loading the PE to a particular address. The PE file contains only file offsets and RVAs - mostly RVAs because file offsets are useful only before loading the PE and all other offsets are used after loading (for example to relocate a DLL if it couldn''t be loaded to the preferred base address). Why do file offsets and RVAs differ??? The sections inside the PE file have an alignment in both the PE file and in memory after loading but these alignemnt values usually differ. In the PE file the beginning of sections are usually aligned to 512byte or 4K boundaries (disk sector size) to allow efficient loading and in the memory the same sections are put to a start address that are usually aligned to 4K boundaries (memory page size). Often 4K is used for both file and memory alignment to make loading even easier. Lets say a linker decided to use 512byte align in the PE file and 4K in memory. In this case when you load the sections from the file it can happen that the gaps between you sections is zero inside the file but after loading this gap increases because the loader must satisfy the 4K memory alignemnt. Example: Your compiler compiles a hello world program that contains a few bytes of code, lets say 100 bytes and a few bytes of data: "Hello World!". After this the linker combines the output of your compiler into an executable. Note that this can be done in several (infinite) ways but I describe a standard usual way. The linker will probably put the data and the code into different sections. Please note that the header of your PE file is also a "section" that is always placed at file offset zero and RVA zero but it doesn''t have its own section header. Still, I always treat it as the minus first section :-). This is very important because it implies that your first "real section" that contains code/data can not reside on the zero offsets in file/memory! Lets say your linker does a great job and assembles a small header for your PE (~384 bytes). In this case the next section must be placed at file offset 512 and RVA 4096 because of the 512 byte file alignment and 4K memory alignment. This means that there will be some gap between the header of your PE and the first section in both the file and in the memory after loading your file, but the gap size is different before and after loading! This is why its important to use both file offsets and Relative Virtual Addresses (RVAs). Of course the linker could decide to put your first section to higher offsets but it usually doesn''t do that. The RVA of your first section could be anything that is a multiple of the memory alignment (k*4096 where k is integer and k>0). I mention this because I could forge you very strange PE files with a hex editor that your program couldn''t analize if it handles only the output of some popular linkers. So in case of our hello world program one possible output can be the following:
Header (file_offset=0, RVA=0)
Code (file_offset=512, RVA=4096)
Data (file_offset=1024, RVA=8192)
Why on earth do we need to split the whole stuff to sections??? Because the memory protection flags of each section can be different. The compiler can decide to put exec and readonly flags on your code section and noexec/readonly, maybe noexec/readwrite flags on your data sections. Memory protection can be set only per 4K page on x86 platforms. All sections (except the header section) has an IMAGE_SECTION_HEADER entry in the PE header. This IMAGE_SECTION_HEADER contains info about a section: name, file offset, RVA, Characteristics (flags that define memory protection, note that many characteristics flags map to the same memory protection flags). Note that the name of the section is perfectly useless, section names are used mainly by the linker and ignored by the windows PE loaded. For example MS linkers use ".text" as the name of the code section but what if I forge a PE and give ".text" as a name to my data section? (By the way: PE antidebug progs zero out the section names...) Note that the header section doesn''t have a corresponding IMAGE_SECTION_HEADER so its flags are set by windows. On most windows programs the header section is also executable (that is a security hole), for example the CIH virus infected files by putting itself into the gap between the header section and the first section in the file. This was possible because most linkers used 4K as a file alignment and the header is usually much smaller than 4K so there is a fairly large gap in the file after the header that will be loaded along with the header and get the same memory protection as the header if you adjust the SizeOfHeaders field in the header!

As a final step lets make things a bit more clear so I give you my brain dump on how I percieve these things in general. The PE file contains sections with small gaps between them (including the PE headers as a special very first section). After loading the PE these gaps MAY increase because of rounding to memory page boundaries. Basically the only file offsets you will use are probably the IMAGE_SECTION_HEADER.PointerToRawData fields that are offsets in the file to the beginning of sections. After loading file offsets are useless so most of the other header fields that point to something are expressed as RVAs. A good example to this is for example the IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint that specifies where the execution of your program begins. You can use the section headers to convert between file offsets and RVAs like I did in my example program. Note that any section that has the right memory protection flags (executable) can contain code (including headers on some windows versions)!!! And of course you can change protection flags of 4K memory pages programmatically, or you can copy some parts of a data section to an executable memory area or whatever... Of course if we speak of executables put together by nice linkers than executable code, readonly and read/write data are well separated but if you want to prepare for anything that you must take into account the extreme things as well, for example: Entry point pointing to the header section! (like in case of the CIH virus!) large gaps between the sections after loading (My "manually added" section can reside on file offset 1024 while its RVA is 0x10000 or larger leaving a big gap in memory between this and the previous section). Unfortunately you can not exactly tell which bytes are executable and which ones aren''t. If you trust the linker and memory flags then you can tell... Good exe examiners (like IDA pro) just start out the IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint and examine all executable bytes and all possible flows of control and disassemble on the fly every possibly reachable bytes. You wont find all executable bytes even in this case. Lets say I compute an address into the eax register and then I say "jmp eax". Even if the address is always the same because I just add two constant numbers you need a very good static analyzer built into your prog to find out the jump target. I can put a small twist to it for example by reading an int value from a config file that can have either 0 or 1 as a value and calculate this in into my jump target address, in this case you have no chance to find out the target. In IDA pro you can manually add hints to IDA and you can say "Hey IDA, examine these bytes here as code. Examine these bytes as an array of 5 XYZ structs.".

Just came to my mind that the size of a section in memory can be larger than its size in file. For example some linkers put your initialized read/write data to a section (lets say you have 4096 bytes initialized global data) but the linker can specify 8192 as IMAGE_SECTION_HEADER.VirtualSize that means that the file will contain just the 4K initialized data, but after loading 8K memory is allocated. This is a nice trick for compilers to allocate space for you uninitialized global variables without wasting space in your PE file!

I''ve never read these things together but I truly believe that these are the foundations of PE file hacking and most people starting in the topic can start/progress slowly because they are not in clearence of these. I would also mention that you''ve started writing a program that is probably far beyond your current knowledge so you either have to read and UNDERSTAND hundreds of pages on related tricks you need or give up with the idea of writing the next IDA Pro. Another important thing is that your question is not specific, involves a lot of difficult topics, this is why noone will really answer it. My long comments are forming just the intro to the topic (the very beginning) so they can help you to decide whether to invest the significant amount of time or not. And definitely check out IDA Pro if you don''t know it. That tool has a free version (5.x if I remember right).


这篇关于在PE文件中搜索操作码字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆