PE - 区分数据与功能导出 [英] PE - Distinguish data from function export

查看:346
本文介绍了PE - 区分数据与功能导出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出一种在IDA中找出出口是数据导出的方法,哪些是真正的功能导出。



例如,让我们来看看在Microsoft的msftedit.dll的导出条目:



虽然 CreateTextServices 是一个真正的导出功能:



IID_IRichEditOle 是数据导出,IDA无法意识到,将数据作为代码分组:



有人知道一个可靠的方式区分两个?



提前感谢

解决方案

每个导出都没有完全可靠的方法来执行此操作。



每个导出仅指定可执行文件中的偏移量 - 逻辑上可以将其视为代码或数据通过引用它的任何其他代码。



如前所述,您可以提出启发式来检测几乎所有情况下的导出类型,但是将很容易得出不适用于任何给定启发式的反例。例如,您提出的规则:


如果出现一个<$ c $,导出的条目将被视为有效的导出函数c> ret 指令在功能中有< min> 有效说明 IDA识别函数的调用约定。


虚假负面它使用尾部调用优化,以$ code> jmp 说明,而不是 ret 说明。任何短暂的功能也将失败。并且有几种方式可以将IDA混淆为不将代码视为一种功能。



假阳性:可能有一个字符串紧随其后的是一个 C3 C2 db'BACKGAMMON0',0,0C3h - 这可以逻辑地反汇编为具有 ret 并且没有参数的有效的11指令函数。



当您认为导出可以在逻辑上视为代码数据时,线条会进一步模糊:假设导出时的字节序列被复制到动态分配的内存中即使在另一个过程中,它稍后以代码的形式执行。



也许一个合理的建议是仅仅信任IDA并将导出视为代码,如果IDA认为它的代码。 IDA的大部分功能是自动猜测数据的逻辑类型,而且通常情况下还是很好的。如你所说,有时是错误的。但是无论如何,您无法获得100%的准确性。你可以做的最好的事情是假阴性和假阳性之间的平衡。






证明这个问题的不确定性:



导出是否执行代码是不可判定的。导出是否将被读取为数据也是不可判定的。由于我们不能保证任何一个都是真实的,区分看似模糊的情况是不可能的。



证明:假设我们有一个oracle A(P, I,E)如果程序 P (包括其所有依赖项)执行(或读取)导出 E (从 P 的执行过程中加载的任何DLL)与input(外部状态) I 。我们要构建一个最小的程序 Z(P,I,E)它执行(或读取)导出 E (将DLL加载到地址空间中)当且仅当 A(P,I,E) 返回0。



现在考虑 Z(Z,I,E)的结果



如果 Z(Z,I,E)执行(或读取)导出 E ,然后 A(Z,I,E)将返回1.但是, Z(Z, E)被定义为访问导出 E 除非 A(Z,I,E )返回0.这是一个矛盾。



如果 Z(Z,I,E)不执行(或读取)导出 E ,然后 A(Z,I,E)返回0.但 Z(Z,I,E)被定义为使得访问导出 E A(Z,I,E)返回0.这是一个矛盾。



因此我们最初的假设是:oracle A(P,I,E)证明是错误的。






但是你可以通过工具来做得更好...



根据您要解决的确切问题,您可能可以确定哪些出口在运行时是有效的功能。



例如,您可以编写一个应用程序,该应用程序调试您要分析的程序,并将<在每个页面上的href =https://msdn.microsoft.com/en-us/library/aa366549(v=vs.85).aspx =nofollow noreferrer>保护页面包含您希望挂钩的出口。这意味着,每当一个页面被访问(执行/读取/写入)时,异常被提出,并且调试器程序获得控制。



调试器可以检查程序了解哪些类型的访问是进行访问的,以及它是否与导出有关。如果访问是尝试执行导出,则可以在将控制权返回给程序之前执行一些挂接功能。



在这两种情况下,每个 PAGE_GUARD 修饰符在每个例外,所以你每次都需要把它放回来。



不出所料,这将使您的程序执行速度非常慢对包含导出的任何页面的R / W / X访问会导致昂贵的上下文切换 - - 这可能包括执行大多数指令,这些指令是导出函数的一部分,还有其他一些与它们无关的其他参数。



您可以采取与其他仪器工具类似的方法,例如 Pin



请注意,您可能无法获取有关每次通过仪器出口的使用情况的信息。这是因为您可能需要确定导致程序访问每个导出所需的输入/外部状态,以了解是否将其用作代码或数据(如果有的话)。



另请注意,执行和读取(甚至写入)访问可能会在同一个导出中发生。


I'm trying to find a way to figure out in IDA which exports are data exports and which are real functions export.

For example, let's have a look at Microsoft's msftedit.dll's export entries:

While CreateTextServices is a real exported function:

IID_IRichEditOle is a data export and IDA fails to realize that, interpeting data as code:

Do someone know a reliable way to distinguish the two? Help will be much appreciated.

Thanks in advance.

解决方案

There is no perfectly reliable way to do this for every export.

Each export only specifies an offset within the executable file -- logically, it could be treated as code or as data by any other code that references it.

As you mentioned, you could come up with heuristics to detect the type of the export in almost all of the cases, but it would be easy to come up with counterexamples that do not work for any given heuristic. Take, for instance, the rule you proposed:

The exported entry will be considered a valid exported function if there is a ret instruction in the function, and there are more than <min> valid instructions, and IDA recognizes the function's calling convention.

False negatives: You might have a function that uses tail call optimization and ends with jmp instructions rather than ret instructions. Any short function would also fail. And there are several ways that IDA can be confused into not treating the code as a function.

False positives: There could be a string in memory followed closely by a C3 or C2 like db 'BACKGAMMON0',0,0C3h -- this could logically disassemble as a valid 11-instruction function with a ret and no arguments.

The lines are blurred even further when you consider that an export could be logically treated as both code and data: Imagine that a byte sequence at an export is copied into dynamically allocated memory -- potentially even in another process -- where it is later executed as code.

Perhaps a reasonable suggestion would be to just trust IDA and treat the export as code if IDA thinks it's code. A large part of IDA's functionality is automatically guessing the logical types of data, and it's normally pretty good at it. As you've shown, sometimes it's wrong. But you can't get 100% accuracy anyway. The best you can do is balance between false negatives and false positives.


Proof of this problem's undecidability:

Whether or not an export will be executed as code is undecidable. Whether or not an export will be read as data is also undecidable. Since we cannot guarantee that either is true, distinguishing between seemingly ambiguous cases is impossible.

Proof: Assume that we have an oracle A(P,I,E) which returns 1 if program P (including all of its dependencies) executes (or reads from) export E (from any DLL loaded in the course of P's execution) with "input" (external state) I. Otherwise, it returns 0.

Let us construct a minimal program Z(P,I,E) which executes (or reads from) export E (the DLL for which is loaded into the address space) if and only if A(P,I,E) returns 0.

Now consider the result of Z(Z,I,E):

If Z(Z,I,E) executes (or reads from) export E, then A(Z,I,E) would return 1. But Z(Z,I,E) is defined to not access export E unless A(Z,I,E) returns 0. This is a contradiction.

If Z(Z,I,E) does not execute (or read from) export E, then A(Z,I,E) would return 0. But Z(Z,I,E) is defined such that it will access export E when A(Z,I,E) returns 0. This is a contradiction.

Therefore, our initial assumption that oracle A(P,I,E) exists is proven false.


But you can do better through instrumentation...

Depending on the exact problem you're trying to solve, you may be able to determine which exports are valid functions at runtime.

For example, you could write an application which debugs the program you which to analyze and places guard pages on each of the pages that contain exports you wish to hook. This means, whenever a page is access (executed/read/written to), an exception is raised, and the debugger program gains control.

The debugger could check the program context to see what type of access was made and whether it has anything to do with the export. If the access is an attempt to execute an export, it could perform some hooking functionality before returning control to the program. Otherwise, it could just return control to the program.

In either case, the PAGE_GUARD modifier is lifted after each exception, so you'd need to put it back each time.

Unsurprisingly, this would make execution of your program very slow, as any R/W/X access to any of the pages containing an export causes an expensive context switch -- this would likely include the execution of most instructions that are a part of your exported functions, along with several others that have nothing to do with them.

You could take a similar approach with other instrumentation tools, such as Pin.

Note that you may not gain information about the usage of every export through instrumentation. This is because you may need to determine what input/external state is required to cause the program to access each export in order to learn if it is used as code or as data (if at all).

Also note that both execute and read (or even write) accesses could potentially occur on the same exports.

这篇关于PE - 区分数据与功能导出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆