变形代码示例 [英] Metamorphic Code Examples

查看:196
本文介绍了变形代码示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解 Polymorphic

I understand the concept of Polymorphic and Metamorphic code but I recently read the Wikipedia page on both (for what ever reason I hadn't done this previously!). Now I really want to have a go at writing some metamorphic code for myself.

我是一个精通语言的人,是许多人的绊脚石.我知道一些PHP,MySQL,c/c ++,Java,Bash脚本,Visual Basic 6,VBScripting,Perl,JavaScript.

I am a master of no language, dabbler of many. I know some PHP, MySQL, c/c++, Java, Bash scripting, Visual Basic 6, VBScripting, Perl, JavaScript.

任何人都可以以任何一种语言提供变形代码示例.我想看到一个有效的示例,即使程序的输出只是"Hello World",也要通过示例来了解它是如何发生的(我正在努力理论化如何仅通过思维就能实现这些技术).任何一种语言都可以做到,只是首选.

Can anyone provide an example of metamorphic code in any of these languages. I would like to see a working example, even where the output of the program is just "Hello World", to understand through example how this is happening (I am struggling to theorise how these techniques can be achieved through mental thought alone). Any language would do really, those are just preferred ones.

另外,在Internet上搜索仅返回了有限数量的c/c ++示例(甚至没有完整的工作示例,更多的代码片段片断),这是因为我建议的其他语言不够低,无法满足要求.拥有编写变形代码所需的功能/灵活性?

Additionally, searching the Internet has only returned a limited number of examples in c/c++ (not even complete working examples, more partial snippets of code), is that because the other languages I have suggested aren't low level enough to have the power/flexibility required to make metamorphic code?

推荐答案

下面是我认为可以归类为用C语言编写的变形代码的示例. C代码,因此可能需要进行一些修改才能在其他平台上编译(我在Windows上使用的是Borland的旧版本).此外,由于它涉及某些机器代码生成,因此它依赖于目标平台为x86.从理论上讲,它应该可以在任何x86操作系统上进行编译.

Below is an example of what I believe would classify as metamorphic code written in C. I'm afraid I don't have a great deal of experience writing portable C code, so it may require some modification to compile on other platforms (I'm using an old version of Borland on Windows). Also, it relies on the target platform being x86 since it involves some machine code generation. In theory it should compile on any x86 OS though.

工作方式

每次运行该程序时,它都会使用不同的文件名自行生成一个随机修改的副本.它还会打印出已修改的偏移量列表,以便您可以看到它实际上在做些什么.

Each time the program is run, it generates a randomly modified copy of itself, with a different filename. It also prints out a list of offsets that have been modified so you can see it actually doing something.

修改过程非常简单.只是使用汇编指令序列来解释源代码,而汇编指令序列实际上什么也没做.当程序运行时,它会找到这些序列,并用不同的代码随机替换它们(显然也无济于事).

The modification process is very simplistic. The source code is just interpreted with sequences of assembly instructions that effectively do nothing. When the program is run, it finds these sequences and randomly replaces them with different code (which obviously also does nothing).

硬编码偏移量列表对于其他人需要能够编译的东西显然是不现实的,因此生成序列的方式使其可以轻松地在目标代码的搜索中识别,希望没有匹配任何误报.

Hardcoding a list of offsets obviously isn't realistic for something that other people need to be able to compile, so the sequences are generated in a way that makes them easy to identify in a search through the object code, hopefully without matching any false positives.

每个序列都从对某个寄存器的推入操作,一组修改该寄存器的指令开始,然后是将寄存器恢复到其初始值的弹出操作.为简单起见,在原始源代码中,所有序列均为PUSH EAX,八个NOPPOP EAX.不过,在该应用的所有后续版本中,这些序列都是完全随机的.

Each sequence starts with a push operation on a certain register, a set of instructions that modify that register, and then a pop operation to restore the register to its initial value. To keep things simple, in the original source all of the sequences are just PUSH EAX, eight NOPs, and POP EAX. In all subsequent generations of the app, though, the sequences will be entirely random.

解释代码

我将代码分为多个部分,因此我可以尝试逐步解释它.如果要自己编译,则只需将所有部分结合在一起.

I've split the code up into multiple parts so I can try to explain it step by step. If you want to compile it yourself, you'll just need to join all the parts together.

首先,一些相当标准的内容包括:

First some fairly standard includes:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

接下来,我们为各种x86操作码定义.这些通常将与其他值组合以生成完整指令.例如,PUSH定义(0x50)本身是PUSH EAX,但是您可以通过添加0到7范围内的偏移量来导出其他寄存器的值.

Next we have defines for various x86 opcodes. These will typically be combined with other values to generate a full instruction. For example, the PUSH define (0x50) by itself is PUSH EAX, but you can derive the values for other registers by adding an offset in the range 0 to 7. Same thing for POP and MOV.

#define PUSH 0x50
#define POP  0x58
#define MOV  0xB8
#define NOP  0x90

最后六个是几个两字节操作码的前缀字节.第二个字节对操作数进行编码,稍后将对其进行详细说明.

The last six are the prefix bytes of several two-byte opcodes. The second byte encodes the operands and will be explained in more detail later.

#define ADD  0x01
#define AND  0x21
#define XOR  0x31
#define OR   0x09
#define SBB  0x19
#define SUB  0x29

const unsigned char prefixes[] = { ADD,AND,XOR,OR,SBB,SUB,0 };

JUNK 是一个宏,可将垃圾操作序列插入代码中需要的任何位置.如前所述,它最初只是写出PUSH EAXNOPPOP EAX. JUNKLEN 是该序列中NOP的数量,而不是序列的全长.

JUNK is a macro that inserts our sequence of junk operations anywhere we want in the code. As I explained before, it's initially just writing out PUSH EAX, NOP, and POP EAX. JUNKLEN is the number of NOPs in that sequence - not the full length of the sequence.

如果您不知道,__emit__是一个伪函数,可将文字值直接注入目标代码中.我怀疑如果您使用其他编译器,可能需要移植一下.

And in case you're not aware, __emit__ is a pseudo-function that injects literal values directly into the object code. I suspect it may be something you need to port if you're using a different compiler.

#define JUNK __emit__(PUSH,NOP,NOP,NOP,NOP,NOP,NOP,NOP,NOP,POP)
#define JUNKLEN 8

一些全局变量将在其中加载我们的代码.全局变量不好,但是我并不是一个特别好的编码器.

Some global variables where our code will be loaded. Global variables are bad, but I'm not a particularly good coder.

unsigned char *code;
int codelen;

接下来,我们有一个简单的函数,该函数会将目标代码读入内存.我从不释放内存,因为我不在乎.

Next we have a simple function that will read our object code into memory. I never free that memory because I just don't care.

请注意在随机点插入的 JUNK 宏调用.在整个代码中,您将看到更多这些内容.您几乎可以在任何地方插入它们,但是如果您使用的是真正的C编译器(而不是C ++),则尝试将它们放在变量声明之前或之间会抱怨.

Notice the JUNK macro calls inserted at random points. You're going to see a lot more of these throughout the code. You can insert them almost anywhere, but if you're using a real C compiler (as opposed to C++) it'll complain if you try to put them before or in-between variable declarations.

void readcode(const char *filename) {
  FILE *fp = fopen(filename, "rb");    JUNK;
  fseek(fp, 0L, SEEK_END);             JUNK;
  codelen = ftell(fp);
  code = malloc(codelen);              JUNK;
  fseek(fp, 0L, SEEK_SET);
  fread(code, codelen, 1, fp);         JUNK;
}

另一个简单的函数,用于在修改应用程序后再次将其写出.对于新文件名,我们只需将原始文件名的最后一个字符替换为每次递增的数字即可.没有尝试检查文件是否已经存在,并且我们没有覆盖操作系统的关键部分.

Another simple function to write the application out again after it has been modified. For the new filename we just replace the last character of the original filename with a digit that is incremented each time. No attempt is made to check whether the file already exists and that we're not overwriting a crucial piece of the operating system.

void writecode(const char *filename) {
  FILE *fp;
  int lastoffset = strlen(filename)-1;
  char lastchar = filename[lastoffset];
  char *newfilename = strdup(filename);  JUNK;
  lastchar = '0'+(isdigit(lastchar)?(lastchar-'0'+1)%10:0);
  newfilename[lastoffset] = lastchar;
  fp = fopen(newfilename, "wb");         JUNK;
  fwrite(code, codelen, 1, fp);          JUNK;
  fclose(fp);
  free(newfilename);
}

下一个函数为我们的垃圾序列写出一条随机指令. reg 参数表示我们正在使用的寄存器-在序列的任一端将压入并弹出的内容. offset 是代码中将要写入指令的偏移量.而 space 给出了序列中剩余的字节数.

This next function writes out a random instruction for our junk sequence. The reg parameter represents the register we're working with - what will be pushed and popped at either end of the sequence. The offset is the offset in the code where the instruction will be written. And space gives the number of bytes we have left in our sequence.

取决于我们有多少空间,我们可能会受限于可以写出哪些指令,否则我们会随机选择是NOPMOV还是其他之一. NOP只是一个字节. MOV为五个字节:我们的MOV操作码(添加了 reg 参数),以及4个随机字节,代表移入寄存器的数字.

Depending on how much space we have, we may be restricted to which instructions we can write out, otherwise we choose at random whether it's a NOP, MOV or one of the others. NOP is just a single byte. MOV is five bytes: our MOV opcode (with the reg parameter added), and 4 random bytes representing the number moved into the register.

对于两个字节序列,第一个只是我们随机选择的前缀之一.第二个字节是0xC00xFF范围内的一个字节,其中最低有效3位代表主寄存器-即必须将其设置为我们的 reg 参数的值.

For the two byte sequences, the first is just one of our prefixes chosen at random. The second is a byte in the range 0xC0 to 0xFF where the least significant 3 bits represent the primary register - i.e. that must be set to the value of our reg parameter.

int writeinstruction(unsigned reg, int offset, int space) {
  if (space < 2) {
    code[offset] = NOP;                         JUNK;
    return 1;
  }
  else if (space < 5 || rand()%2 == 0) {
    code[offset] = prefixes[rand()%6];          JUNK;
    code[offset+1] = 0xC0 + rand()%8*8 + reg;   JUNK;
    return 2;
  }
  else {
    code[offset] = MOV+reg;                     JUNK;
    *(short*)(code+offset+1) = rand();
    *(short*)(code+offset+3) = rand();          JUNK;
    return 5;
  }
}

现在,我们具有读取这些指令之一的等效功能.假设我们已经从序列任一端的PUSHPOP操作中识别了 reg ,则此函数可以尝试验证给定 offset 是我们的垃圾操作之一,并且主寄存器与给定的 reg 参数匹配.

Now we have the equivalent function for reading back one of these instructions. Assuming we've already identified the reg from the PUSH and POP operations at either end of the sequence, this function can attempt to validate whether the instruction at the given offset is one of our junk operations and that the primary register matches the given reg parameter.

如果找到有效的匹配项,则返回指令长度,否则返回零.

If it finds a valid match, it returns the instruction length, otherwise it returns zero.

int readinstruction(unsigned reg, int offset) {
  unsigned c1 = code[offset];
  if (c1 == NOP)
    return 1;                     JUNK;
  if (c1 == MOV+reg)
    return 5;                     JUNK;
  if (strchr(prefixes,c1)) {
    unsigned c2 = code[offset+1]; JUNK;
    if (c2 >= 0xC0 && c2 <= 0xFF && (c2&7) == reg)
      return 2;                   JUNK;
  }                               JUNK;
  return 0;
}

下一个功能是搜索和替换垃圾序列的主循环.首先,在八个字节后(或设置为 JUNKLEN 的任何内容)后,在同一寄存器上查找PUSH操作码,然后查找POP操作码.

This next function is the main loop the searches for and replaces the junk sequences. It starts by looking for a PUSH opcode followed by a POP opcode on the same register eight bytes later (or whatever JUNKLEN was set to).

void replacejunk(void) {
  int i, j, inc, space;
  srand(time(NULL));                                 JUNK;

  for (i = 0; i < codelen-JUNKLEN-2; i++) {
    unsigned start = code[i];
    unsigned end = code[i+JUNKLEN+1];
    unsigned reg = start-PUSH;

    if (start < PUSH || start >= PUSH+8) continue;   JUNK;
    if (end != POP+reg) continue;                    JUNK;

如果寄存器实际上是ESP,则可以安全地跳过它,因为我们永远不会在生成的代码中使用ESP(ESP上的堆栈操作需要特殊的考虑,这是不值得的)

If the register turns out to be ESP, we can safely skip it because we'll never use ESP in our generated code (stack operations on ESP need special consideration that isn't worth the effort).

    if (reg == 4) continue; /* register 4 is ESP */

一旦我们匹配了看起来很相似的PUSH和POP组合,然后我们尝试阅读它们之间的说明.如果我们成功匹配了期望的字节长度,则认为该匹配可以替换.

Once we've matched a likely looking PUSH and POP combination, we then try to read the instructions in-between. If we successfully match the length of bytes we're expecting, we consider that a match that can be replaced.

    j = 0;                                           JUNK;
    while (inc = readinstruction(reg,i+1+j)) j += inc;
    if (j != JUNKLEN) continue;                      JUNK;

然后我们从7个寄存器中随机选择一个(如在不考虑ESP之前所述),并在序列的任一端写出该寄存器的PUSHPOP操作.

We then pick one of 7 registers at random (as explained before we don't consider ESP), and write out the PUSH and POP operations for that register at either end of the sequence.

    reg = rand()%7;                                  JUNK;
    reg += (reg >= 4);
    code[i] = PUSH+reg;                              JUNK;
    code[i+JUNKLEN+1] = POP+reg;                     JUNK;

那么我们要做的就是使用我们的 writeinstruction 函数填充它们之间的空间.

Then all we need to do is fill in the space in-between using our writeinstruction function.

    space = JUNKLEN;
    j = 0;                                           JUNK;
    while (space) {
      inc = writeinstruction(reg,i+1+j,space);       JUNK;
      j += inc;
      space -= inc;                                  JUNK;
    }

在这里我们显示刚打过补丁的偏移量.

And here's where we display the offset that we just patched.

    printf("%d\n",i);                                JUNK;
  }
}                                                                             

最后,我们有了主要功能.这只是调用了前面描述的功能.我们读入代码,替换垃圾,然后再次将其写出. argv[0]参数包含应用程序文件名.

Finally we have the main function. This just calls the functions previously described. We read in the code, replace the junk, then write it out again. The argv[0] argument contains the application filename.

int main(int argc, char* argv[]) {

  readcode(argv[0]);     JUNK;
  replacejunk();         JUNK;
  writecode(argv[0]);    JUNK;

  return 0;
}

仅此而已.

一些最后的笔记

运行此代码时,显然,您需要确保用户具有适当的权限,可以在与原始代码相同的位置写出文件.然后,一旦生成了新文件,如果您在文件扩展名很重要的系统上,通常需要重命名它,或者在需要时设置其执行属性.

When running this code, obviously you need to make sure the user has the appropriate permissions to write out a file in the same location as the original code. Then once the new file has been generated, you'll typically need to rename it if you're on a system where the file extension is important, or set its execute attributes if that is needed.

最后,我怀疑您可能想通过调试器运行生成的代码,而不是直接执行并希望达到最佳效果.我发现,如果将生成的文件复制到原始可执行文件上,调试器很乐意让我逐步浏览它,同时仍然查看原始源代码.然后,只要到达代码中的 JUNK 点,您就可以跳入程序集视图并查看已生成的代码.

Finally, I suspect you may want to run the generated code through a debugger rather than just executing it directly and hoping for the best. I found that if I copied the generated file over the original executable, the debugger was happy to let me step through it while still viewing the original source code. Then whenever you get to a point in the code that says JUNK, you can pop into the assembly view and look at the code that has been generated.

无论如何,我希望我的解释已经相当清楚了,这就是您想要的示例.如有任何疑问,请随时在评论中提问.

Anyway, I hope my explanations have been reasonably clear, and this was the kind of example you were looking for. If you have any questions, feel free to ask in the comments.

奖金更新

作为奖励,我想我还会在脚本语言中包含一个变形代码示例.这与C示例完全不同,因为在这种情况下,我们需要更改源代码,而不是二进制可执行文件,所以我认为这要容易一些.

As a bonus, I thought I'd also include an example of metamorphic code in a scripting language. This is quite different from the C example, since in this case we need to mutate the source code, rather than the binary executable, which is a little easier I think.

在此示例中,我广泛使用了php的goto函数.每行以一个标签开头,并以goto结束,指向下一行的标签.这样,每一行基本上都是独立的,我们可以很高兴地对它们进行洗牌,而程序仍然可以像以前一样正常工作.

For this example, I've made extensive use of php's goto function. Every line starts with a label, and ends with a goto pointing to the label of the following line. That way each line is essentially self contained, and we can happily shuffle them and still have the program work exactly as before.

条件和循环结构稍微复杂一点,但是只需要以跳转到两个不同标签之一的条件的形式重写它们即可.我已经在代码中添加了注释标记,这些标记将尝试循环并使其易于遵循.

Conditions and loop structures are a little more complicated, but they just need to be rewritten in the form of a condition that jumps to one of two different labels. I've included comment markers in the code where the loops would be to try and make it easier to follow.

ideone.com上的示例代码

所有代码所做的都是回显本身经过改组的副本,因此您只需将输出剪切并粘贴回源字段并再次运行即可轻松地在ideone上对其进行测试.

All the code does is echo the shuffled copy of itself, so you can easily test it on ideone just by cutting and pasting the output back into the source field and running it again.

如果您希望它进行更多的突变,那么每次运行代码时,要用不同的一组随机字符串替换所有标签和变量,这将非常容易.但是我认为最好尝试使事情尽可能简单.这些示例仅用于说明概念-我们实际上并不是在尝试避免检测. :)

If you wanted it to mutate even more, it would be fairly easy to do something like replace all the labels and variables with a different set of random strings every time the code was run. But I thought it best to try and keep things as simple as possible. These examples are just meant to demonstrate the concept - we're not actually trying to avoid detection. :)

这篇关于变形代码示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆