自动 x86 指令混淆 [英] Automated x86 instruction obfuscation

查看:26
本文介绍了自动 x86 指令混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个 x86 asm 混淆器,它将英特尔语法代码作为字符串并输出一组经过混淆的等效操作码.

I'm working on an x86 asm obfuscator that takes Intel-syntax code as a string and outputs an equivilent set of opcodes that are obfuscated.

这是一个例子:

mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI

变成这样:

mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523     ;
push [ebx]          ; xor eax, [ebx]
or [esp], eax       ;
pop eax             ;
push 12345h         ; push eax
mov [esp], eax      ;
call getEIP         ; call someAPI
getEIP:             ;
add [esp], 9        ;
jmp someAPI         ;

这只是一个例子,我没有检查过这不会搞砸标志(可能会).

This is just an example, I've not checked that this doesn't screw up flags (it probably does).

现在我有一个 XML 文档,其中列出了指令模板(例如 push e*x)和可以使用的替换指令列表.

Right now I have an XML document that lists instruction templates (e.g. push e*x) and a list of replacement instructions that can be used.

我正在寻找的是一种自动生成操作码序列的方法,该操作码序列产生与输入相同的结果.我不介意做一个受过教育的蛮力,但我不确定我会如何处理这个.

What I'm looking for is a way to automatically generate opcode sequences that produce the same result as an input. I don't mind doing an educated bruteforce, but I'm not sure how I'd approach this.

推荐答案

您需要的是操作码功能的代数描述,以及一组允许您确定等效操作的代数定律.

What you need is an algebraic descripton of what the opcodes do, and a set of algebraic laws that allow you to determine equivalent operations.

然后对于每条指令,您查找其代数描述(举个例子,一个

Then for each instruction, you look up its algebraic description (for the sake of an example, an

 XOR  eax,mem[ecx]

其代数等价物是

 eax exclusive_or mem[ecx]

使用那些代数等价物枚举代数等价物,例如:

enumerate algebraic equivalences using those algebra equivalents, such as:

 a exclusive_or b ==> (a and not b) or (b and not a)

为您的 XOR 指令生成等效的代数语句

to generate equivalent algebraic statement for your XOR instruction

 eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

您可以对此应用更多代数定律,例如 de morgans 定理:

You may apply more algebraic laws to this, for instance de morgans' theorem:

 a or b ==> not (not a and not b)

得到

(not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

此时您有一个代数计算的规范,它将执行和原版一样.这是你的蛮力.

At this point you have a specification of an algebraic computation that will do the same thing as the original. There's your brute force.

现在您必须通过匹配哪些指令将其编译"为机器指令将按照这说的做.像任何编译器一样,您可能想要优化生成的代码(获取 mem[ecx] 两次没有意义).(所有这些都很难......它是一个代码生成器!)生成的代码序列类似于:

Now you have to "compile" this to machine instructions by matching what instructions will do with what this says. Like any compiler, you likely want to optimize the generated code (no point in fetching mem[ecx] twice). (All of this hard... its a code generator!) The resulting code sequence would be something like:

mov ebx, mem[ecx]
mov edx, ebx
not edx
and edx, eax
not eax
and eax, ebx
not eax
or eax, edx

这是很多手动构建的机器.

This is a lot of machinery to build manually.

另一种方法是利用程序转换系统,该系统允许您将源到源转换应用于代码.然后,您可以直接在代码上将等效项"编码为重写.

Another way to do this is to take advantage of a program transformation system that allows you to apply source-to-source transformations to code. Then you can encode "equivalences" as rewrites directly on the code.

其中一种工具是我们的 DMS 软件再造工具包.

One of these tools is our DMS Software Reengineering Toolkit.

DMS 采用语言定义(本质上作为 EBNF),自动实现解析器、AST 构建器和漂亮打印机(反解析器,将 AST 转回有效的源文本).[DMS 目前没有 ASM86 的 EBNF,但有数十种 EBNF 用于各种已经为 DMS 构建了复杂的语言,包括一些用于杂项非 x86 汇编程序的语言因此,您必须将 ASM86 EBNF 定义为 DMS.这非常简单;数据管理系统有一个非常强大的解析器生成器].

DMS takes a langauge definition (essentially as an EBNF), automatically implements a parser, AST builder, and prettyprinter (anti parser, turning AST back into valid source text). [DMS doesn't presently have an EBNF for ASM86, but dozens of EBNFs for various complex langauges have been build for DMS including several for miscellaneous non-x86 assemblers So you'd have to define the ASM86 EBNF to DMS. This is pretty straightforward; DMS has a really strong parser generator].

使用它,DMS 可以让您直接在代码上编写源转换.您可以编写以下直接实现 XOR 等价和 DeMorgan 定律的转换:

Using that, DMS will let you write source transformations directly on the code. You could write the following transformations that implement the XOR equivalant and DeMorgan's law directly:

  domain ASM86;

  rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction
  =  " XOR 
, m " 
      rewrites to
     " MOV free_register(),m
       NOT free_register()
       AND free_register(),
 
       NOT 
       AND 
,m
       OR 
,free_register()";

 rule obfuscate_OR(r1: register, r2: register):instruction:instruction
 = " OR 
1, 
2"
     rewrites to
    " MOV free_register(),
1
      NOT free_register()
      AND free_register(),
2
      NOT 
2
      AND 
1,
2
      NOT 
1";

在一个名为free_register"的元过程中添加了一些额外的魔法,它决定了哪些寄存器在代码中的那个点(AST 匹配)是空闲的.(如果您不想这样做,请使用堆栈顶部就像您在示例中所做的那样在任何地方都是临时的).

with some additional magic in a meta-procedure called "free_register" that determines what registers are free at that point (of the AST match) in the code. (If you don't want to do that, use the top of the stack as temporary everywhere as you did in your example).

您需要进行大量重写,以涵盖您想要混淆的所有情况,以及寄存器和内存操作数的组合.

You'd need a bunch of rewrites to cover all the cases that you want to obfuscate, with thier combinatorics with registers and memory operands.

然后可以要求转换引擎在代码中的每个点随机应用这些转换一次或多次以对其进行打乱.

Then the transformation engine can be asked to apply these transformations randomly once or more than once at each point in the code to scramble it.

您可以看到 一些代数变换与 DMS 一起应用的完整示例.

这篇关于自动 x86 指令混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆