通过解释器变异来混淆Python字节码 [英] Obfuscating python bytecode through interpreter mutation

查看:604
本文介绍了通过解释器变异来混淆Python字节码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,Dropbox做得很好,他们能够保护他们在python中制作的桌面应用程序;我研究了很多,但没有好的解决方案比obfuscation更好,这是不安全的方式去,你会最终看到你的代码上传到某个地方。



我倾听了​​由 Giovanni Bajo (PyInstaller创始人)做的一个会话,他说Dropbox这样做:


  1. 字节码通过重新编译你的CPython的解释器和
    ,标准CPython解释器将无法运行它,
    只有重新编译的cpython解释器。

  2. 你需要做的是洗掉 define
    loadup 8
    下面的数字。

我从来没有去过Python的源代码,所以,我不会说我完全理解上面的话。



我需要听到专家的声音:如何做这样的事情?如果重新编译后,我将能够使用可用的工具,如PyInstaller打包我的应用程序?



更新: b

我做了一些关于Dropbox如何做这种类型的混淆/突变的研究,我发现:



根据 Hagen Fritsch ,他们分为两个阶段:


  1. 他们使用TEA密码和一个RNG种子由每个python模块的
    代码对象中的一些值。他们相应地调整了解释器
    ,以便它



    a)解密模块和



    防止访问解密的代码对象。



    这将是一个简单的路径,让dropbox解密所有内容并使用内置marshaller转储模块。

    / li>
  2. 另一个技巧是手动加扰操作码。
    不幸的是,这只能是半自动固定的,因此他们的
    单字母替代密码证明在赢得一些时间的
    方面非常有效。


我仍然想要更多的洞察如何做,更多的,我不知道如何解密在这个过程中发生...我想要所有的专家的声音在这里。 。

解决方案

我想这是关于改写 include / opcode.h 。我没有看到 #define loadup ,但也许是指一些旧的Python版本。我没有尝试过这个。



这会模糊你的 .pyc 文件,使他们不能被任何工具检查可识别正常的 .pyc 文件。这可能有助于您在程序中隐藏一些安全措施。但是,攻击者可能(例如)从您的应用程序包中提取您的自定义Python解释器,并利用它来检查文件。 (只需启动交互式解释器,并通过在模块上导入和使用dir来启动调查)



注意,你的软件包肯定包含Python标准库中的一些模块。如果攻击者猜测你已经混洗操作码,他可以在你的版本和标准模块的正常版本之间进行字节对字节比较,并以这种方式发现你的操作码。为了防止这种简单的攻击,可以使用正确的加密保护模块,并尝试隐藏解释器中的解密步骤,如更新的问题中所述。这会强制攻击者使用机器码调试来查找解密码。







不知道在此过程中如何解密...


您将修改导入模块并插入的解释器部分你的解密C代码就在那里。


Actually, Dropbox made it very well, they were able to secure their desktop application made in python; I researched this a lot, but no good solution better than obfuscation, which is not very secure way to go, and you will end up seeing your code uploaded somewhere.

I listened to a session made by Giovanni Bajo (the PyInstaller founder), he said Dropbox does this:

  1. Bytecode-scrambling by recompiling your CPython's interpreter, and by this, standard CPython interpreter will not be able to run it, only the recompiled cpython interpreter.
  2. All what you need to do is to shuffle the numbers below the define loadup 8.

I've never gone through Python's source code, so, I will not claim that I fully understand the above words.

I need to hear the voice of experts: How to do such a thing? And if after recompilation I will be able to package my application using the available tools like PyInstaller?

Update:

I made some research regarding how Dropbox does this type of obfuscation/mutation, and I found this:

According to Hagen Fritsch, they do it in two stages:

  1. They use TEA cipher along with an RNG seeded by some values in the code object of each python module. They adjusted the interpreter accordingly so that it

    a) Decrypts the modules and

    b) Prevents access to the decrypted code-objects.

    This would have been the straightforward path just letting dropbox decrypt everything and dump the modules using the builtin marshaller.

  2. Another trick used is the manual scrambling of the opcodes. Unfortunately this could only be fixed semiautomatically thus their monoalphabetic substitution cipher proved quite effective in terms of winning some time.

I still want more insights on how this could be done, more over, I don't know how the decryption happens in this process... I want all the experts' voice here ... common guys where are you.

解决方案

I suppose this is about shuffling the numbers in include/opcode.h. I don't see a #define loadup there, though, but maybe that refers to some old Python version. I have not tried this.

This will obfuscate your .pyc files so that they cannot be inspected by any tools that recognize normal .pyc files. This may help you hide some security measures inside your program. However, an attacker might be able (for example) to extract your custom Python interpreter from your app bundle and leverage that to inspect the files. (Just launch the interactive interpreter and start investigation by importing and using dir on a module)

Note also that your package will surely contain some modules from the Python standard library. If an attacker guesses that you have shuffled the opcodes, he could do a byte-for-byte comparison between your version and the normal version of a standard module and discover your opcodes that way. To prevent this simple attack, one can protect the modules with proper encryption and try to hide the decryption step in the interpreter, as mentioned in the updated question. This forces the attacker to use machine code debugging to look for the decryption code.


I don't know how the decryption happens in this process...

You would modify the part of the interpreter that imports modules and insert your decryption C code there.

这篇关于通过解释器变异来混淆Python字节码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆