自动化的x86指令混淆 [英] Automated x86 instruction obfuscation

查看:190
本文介绍了自动化的x86指令混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作在x86汇编混淆这需要英特尔的语法code作为一个字符串,并输出equivilent集运codeS被混淆的。

I'm working on an x86 asm obfuscator that takes Intel-syntax code as a string and outputs an equivilent set of opcodes that are obfuscated.

下面是一个例子:

mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI

变为类似:

mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523     ;
push [ebx]          ; xor eax, [ebx]
or [esp], eax       ;
pop eax             ;
push 12345h         ; push eax
mov [esp], eax      ;
call getEIP         ; call someAPI
getEIP:             ;
add [esp], 9        ;
jmp someAPI         ;

这只是一个例子,我还没有检查,这不搞砸了标志(它可能不会)。

This is just an example, I've not checked that this doesn't screw up flags (it probably does).

现在我有,列出指令模板(如推E * X ),以及可用于替换指令列表。

Right now I have an XML document that lists instruction templates (e.g. push e*x) and a list of replacement instructions that can be used.

我正在寻找一种方法来自动生成产生相同的结果作为输入运算code序列。我不介意做一个受过教育的暴力破解,但是我不知道我怎么会接近这一点。

What I'm looking for is a way to automatically generate opcode sequences that produce the same result as an input. I don't mind doing an educated bruteforce, but I'm not sure how I'd approach this.

推荐答案

您需要的是什么样的运算codeS做一个代数描写的特征,以及一组代数法,让你确定相应的操作。

What you need is an algebraic descripton of what the opcodes do, and a set of algebraic laws that allow you to determine equivalent operations.

那么对于每一个指令,你看看了它的代数描述(一个例子的缘故,
一个

Then for each instruction, you look up its algebraic description (for the sake of an example, an

 XOR  eax,mem[ecx]

其代数当量为

 eax exclusive_or mem[ecx]

列举使用这些代数当量代数等价,如:

enumerate algebraic equivalences using those algebra equivalents, such as:

 a exclusive_or b ==> (a and not b) or (b and not a)

要产生相当的代数语句的XOR指令

to generate equivalent algebraic statement for your XOR instruction

 eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

您可以申请更多的代数法律这一点,比如去摩根定理:

You may apply more algebraic laws to this, for instance de morgans' theorem:

 a or b ==> not (not a and not b)

获得

(not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

在这一点上,你有一个代数计算,会做的规范
同样的事情原来。有你的蛮力。

At this point you have a specification of an algebraic computation that will do the same thing as the original. There's your brute force.

现在,你必须通过匹配什么指示编译这机器指令
会做什么这样说。像任何编译器,你可能要优化
产生code(在取MEM [ECX]两次没有点)。 (所有这一切辛苦......它是一个code发生器!)
由此产生的code顺序是这样的:

Now you have to "compile" this to machine instructions by matching what instructions will do with what this says. Like any compiler, you likely want to optimize the generated code (no point in fetching mem[ecx] twice). (All of this hard... its a code generator!) The resulting code sequence would be something like:

mov ebx, mem[ecx]
mov edx, ebx
not edx
and edx, eax
not eax
and eax, ebx
not eax
or eax, edx

这是一个很大的机械手工打造。

This is a lot of machinery to build manually.

另一种方式来做到这一点是采取程序变换系统,让您源到源转换应用到code的优势。然后你可以带code等价作为直接重写在code。

Another way to do this is to take advantage of a program transformation system that allows you to apply source-to-source transformations to code. Then you can encode "equivalences" as rewrites directly on the code.

这些工具之一是我们的 DMS软件再造工具包

One of these tools is our DMS Software Reengineering Toolkit.

DMS需要的langauge定义(本质上作为一个EBNF),自动实现了一个解析器,AST​​建设者和prettyprinter(抗解析器,把AST放回有效的源文本)。
[DMS并不presently有ASM86的EBNF,不过几十EBNFs各种的
汉语语言的复杂已建立DMS包括一些杂项非x86汇编
所以,你必须定义ASM86 EBNF到DMS。这是pretty简单; DMS
有一个真正强大的解析器生成。

DMS takes a langauge definition (essentially as an EBNF), automatically implements a parser, AST builder, and prettyprinter (anti parser, turning AST back into valid source text). [DMS doesn't presently have an EBNF for ASM86, but dozens of EBNFs for various complex langauges have been build for DMS including several for miscellaneous non-x86 assemblers So you'd have to define the ASM86 EBNF to DMS. This is pretty straightforward; DMS has a really strong parser generator].

使用,DMS会让你直接写源代码的转换在code。你可以写实现了XOR equivalant和下面的转换德摩根定律直接:

Using that, DMS will let you write source transformations directly on the code. You could write the following transformations that implement the XOR equivalant and DeMorgan's law directly:

  domain ASM86;

  rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction
  =  " XOR \r, \m " 
      rewrites to
     " MOV \free_register\(\),\m
       NOT \free_register\(\)
       AND \free_register\(\),\r 
       NOT \r
       AND \r,\m
       OR \r,\free_register\(\)";

 rule obfuscate_OR(r1: register, r2: register):instruction:instruction
 = " OR \r1, \r2"
     rewrites to
    " MOV \free_register\(\),\r1
      NOT \free_register\(\)
      AND \free_register\(\),\r2
      NOT \r2
      AND \r1,\r2
      NOT \r1";

在名为free_register一元程序一些额外的魔力,决定了注册
都是免费的code点(的AST匹配)。 (如果你不想这样做,使用堆栈的顶部
临时无处不在,你在你的例子一样)。

with some additional magic in a meta-procedure called "free_register" that determines what registers are free at that point (of the AST match) in the code. (If you don't want to do that, use the top of the stack as temporary everywhere as you did in your example).

您会需要一堆重写来涵盖所有你想混淆的情况下,与thier与寄存器和存储器操作数组合数学。

You'd need a bunch of rewrites to cover all the cases that you want to obfuscate, with thier combinatorics with registers and memory operands.

然后转换引擎可以要求在该code每点一次比一次随机或多个应用这些转换手忙脚乱了。

Then the transformation engine can be asked to apply these transformations randomly once or more than once at each point in the code to scramble it.

您可以看到一个整个例子一些代数变换的被应用与DMS。

这篇关于自动化的x86指令混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆