修改python字节码 [英] modifying python bytecode

查看:99
本文介绍了修改python字节码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何修改字节码,然后重新编译该代码,以便可以在python中将其用作函数?我一直在尝试:

  a = 
def fact():
a = 8
a = 0

c = compile(a,'< string>','exec')
w = c.co_consts [0] .co_code
dis(w)

反编译为:

  0 LOAD_CONST 1(1)
3 STORE_FAST 1(1)
6 LOAD_CONST 2(2)
9 STORE_FAST 1(1)
12 LOAD_CONST 0(0)
15 RETURN_VALUE

假设我想摆脱线0和3,我叫:

  x = c.co_consts [0] .co_code [6:16] 
dis(x)

结果为:

  0 LOAD_CONST 2(2)
3 STORE_FAST 1(1)
6 LOAD_CONST 0(0)
9返回值

如果我尝试<$ c,我的问题是 x 怎么办$ c> exec x 我得到一个预期的没有nullbytes的字符串,对于 exec w
尝试编译 x ,我得到相同的结果:



我不确定最好的处理方式是什么,除了也许我需要创建某种代码对象,但我不确定我不确定如何使用,但我假设它必须是
可能又是字节播放,python汇编器等



我正在使用python 2.7.10 ,但我希望它将来可能兼容(例如python 3)。

解决方案

更新:出于各种原因,我已经开始编写Cross-Python-version汇编程序。参见 https://github.com/rocky/python-xasm 仍处于早期beta。


据我所知,没有其他当前维护的 Python汇编器。 PEAK的字节码反汇编程序是为Python 2.6开发的,后来进行了修改以支持早期的Python 2.7。 p>

文档上的内容非常酷。但这依赖于其他可能存在问题的PEAK库。


我将遍历整个示例,以使您对需要做的事情有所了解。


基本上,在修改字节码之后,您需要创建一个新的 types.CodeType 对象。您需要一个新的对象,因为代码类型中的许多对象都是有原因的,您无法更改。例如,解释器可能已缓存了其中一些对象值。


创建代码后,可以在使用代码类型的函数中使用此代码,而该代码类型可以在中使用exec eval


也可以将其写入字节码文件。 las,代码格式已在Python版本1.3、1.5、2.0、3.0和3.8之间更改。顺便说一下,优化和字节码也是如此。实际上,在Python 3.6中,它们将是 word 代码而不是字节码。


因此,对于示例,这是您必须要做的:

  a =。 
def fact():
a = 8
a = 0
返回

c = compile(a,'< string>','exec')
fn_code = c.co_consts [0]#从dis import dis的主代码
中拾取功能代码
dis(fn_code)
print( = * * 30)

x = fn_code.co_code [6:16]#修改字节码

导入类型
opt_fn_code = types.CodeType(fn_code.co_argcount,
#c.co_kwonlyargcount,在Python3中添加此代码
#c.co_posonlyargcount,在python 3.8+
fn_code中添加此代码。 co_nlocals,
fn_code.co_stacksize,
fn_code.co_flags,
x,#fn_code.co_code:这是您更改了
fn_code.co_consts,
fn_code.co_names,
fn_code.co_varnames,
fn_code.co_filename,
fn_code.co_name,
fn_code.co_firstlineno,
fn_code.co_lnotab,#通常,您应该调整此
fn_code.co_freevars,
fn_code.co_cellvars)
dis(opt_fn_code)
print( = * 30)
print( Result is,eval(opt_fn_code))

#现在让我们更改返回值
co_consts = list(opt_fn_code.co_consts)
co_consts [-1] = 10

opt_fn_code = types.CodeType(fn_code.co_argcount,
#c.co_kwonlyargcount,在Python3中添加此代码
#c.co_posonlyargcount,在Python 3.8+中添加此代码
fn_code.co_nlocals,
fn_code.co_stacksize,
fn_code.co_flags,
x,#fn_code.co_code:这是您更改了
元组(co_consts ),#现在也已更改
fn_code.co_names,
fn_code.co_varnames,
fn_code.co_filename,
fn_code.co_name,
fn_code.co_firstlineno,
fn_code.co_lnotab,#一般来说,您应该调整此
fn_code.co_freevars,
fn_code.co_cellvars)

dis(opt_fn_code)
print( = ; * 30)
print( Result is now,eval(opt_fn_code))

这是我得到的:

  3 0 LOAD_CONST 1(8)
3 STORE_FAST 0(a)

4 6 LOAD_CONST 2(0)
9 STORE_FAST 0(a)

5 12 LOAD_FAST 0(a)
15 RETURN_VALUE
======= =======================
3 0 LOAD_CONST 2(0)
3 STORE_FAST 0(a)

4 6 LOAD_FAST 0(a)
9 RETURN_VALUE
============================= b $ b('结果为',0)
3 0 LOAD_CONST 2(10)
3 STORE_FAST 0(a)

4 6 LOAD_FAST 0(a)
9 RETURN_VALUE
=============================
('Result is now',10)

请注意,即使我在代码中删除了几行,行号也没有改变。那是因为我没有更新 fn_code.co_lnotab


如果您现在想从中编写一个Python字节码文件。这是您要做的事情:

  co_consts = list(c.co_consts)
co_consts [0] = opt_fn_code
c1 = types.CodeType(c.co_argcount,
#c.co_posonlyargcount,在python 3.8+
#c.co_kwonlyargcount中添加,在Python3中添加
c.co_nlocals,
c.co_stacksize,
c.co_flags,
c.co_code,
元组(co_consts),
c.co_names,
c.co_varnames,
c .co_filename,
c.co_name,
c.co_firstlineno,
c.co_lnotab,#通常,应调整此
c.co_freevars,
c.co_cellvars )

来自结构导入包
,其中open('/ tmp / testing.pyc','w')为fp:
fp.write(pack('Hcc', 62211,'\r','\n')))#Python 2.7幻数
导入时间
fp.write(pack('I',int(time.time())))
#在Python 3.7+中,您需要PEP 552位
#在Python 3中,您需要在此处写出mod 2 ** 32的大小
import marshal
fp.write(marshal.dumps(c1))

为了简化上面代码的编写,我在 xasm 称为 write_pycfile()


现在要检查结果:

  $ uncompyle6 /tmp/testing.pyc 
#uncompyle6版本2.9.2
#Python字节码2.7(62211)
#反汇编自:Python 2.7.12(默认,2016年7月26日22:53:31)
#[GCC 5.4.0 20160609]
#嵌入式文件名:< string>
#编译时间:2016-10-18 05:52:13


def fact():
a = 0
#可以反编译/ tmp /testing.pyc
$ pydisasm /tmp/testing.pyc
#pydisasm版本3.1.0
#从Python 2.7中反汇编的Python字节码2.7(62211)
#时间戳记在代码中: 2016-10-18 05:52:13
#方法名称:< module>
#文件名:< string>
#参数计数:0
#当地人数:0
#堆栈大小:1
#标志:0x00000040(NOFREE)
#常量:
#0:<代码对象事实位于0x7f815843e4b0,文件<字符串>,第2行>
#1:无
#名称:
#0:事实
2 0 LOAD_CONST 0(< 0x7f815843e4b0的代码对象事实,文件< string>,第2行>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0(事实)
9 LOAD_CONST 1(无)
12 RETURN_VALUE


#方法名称:fact
#文件名:< string>
#参数计数:0
#当地人数:1
#堆栈大小:1
#标志:0x00000043(免费|新建|优化)
#常量:
#0:无
#1:8
#2:10
#局部变量:
#0:a
3 0 LOAD_CONST 2(10 )
3 STORE_FAST 0(a)

4 6 LOAD_CONST 0(无)
9 RETURN_VALUE
$

另一种优化方法是在抽象语法树级别(AST)。 compile eval exec 函数可以从AST,或者您可以转储AST。您也可以使用Python模块astor将其写回Python源代码。


但是请注意,某些优化(例如尾递归消除)可能会使字节码处于无法使用的形式以一种真正忠实的方式转换为源代码。请参阅我的pycon2018哥伦比亚闪电谈话,我制作了一个消除尾部递归的视频用字节码了解一下我在说什么。


I was wondering how to modify byte code, then recompile that code so I can use it in python as a function? I've been trying:

a = """
def fact():
    a = 8
    a = 0
"""
c = compile(a, '<string>', 'exec')
w = c.co_consts[0].co_code
dis(w)

which decompiles to:

      0 LOAD_CONST          1 (1)
      3 STORE_FAST          1 (1)
      6 LOAD_CONST          2 (2)
      9 STORE_FAST          1 (1)
     12 LOAD_CONST          0 (0)
     15 RETURN_VALUE   

supposing I want to get rid of lines 0 and 3, I call:

x = c.co_consts[0].co_code[6:16]
dis(x)

which results in :

      0 LOAD_CONST          2 (2)
      3 STORE_FAST          1 (1)
      6 LOAD_CONST          0 (0)
      9 RETURN_VALUE   

my problem is what to do with x, if I try exec x I get an 'expected string without nullbytes and I get the same for exec w, trying to compile x results in: compile() expected string without null bytes.

I'm not sure what the best way to proceed, except maybe I need to create some kind of code-object, but I'm not sure how, but I'm assuming it must be possible aka byteplay, python assemblers et al

I'm using python 2.7.10, but I'd like it to be future compatible (Eg python 3) if it's possible.

解决方案

Update: For sundry reasons I have started writing a Cross-Python-version assembler. See https://github.com/rocky/python-xasm It is still in very early beta.

As far as I know there is no other currently-maintained Python assembler. PEAK's Bytecode Disassembler was developed for Python 2.6, and later modified to support early Python 2.7.

It is pretty cool from the documentation. But it relies on other PEAK libraries which might be problematic.

I'll go through the whole example to give you a feel for what you'd have to do. It is not pretty, but then you should expect that.

Basically after modifying the bytecode, you need to create a new types.CodeType object. You need a new one because many of the objects in the code type, for good reason, you can't change. For example the interpreter may have some of these object values cached.

After creating code, you can use this in functions that use a code type which can be used in exec or eval.

Or you can write this to a bytecode file. Alas the code format has changed between Python versions 1.3, 1,5, 2.0, 3.0, and 3.8. And by the way so has the optimization and bytecodes. In fact, in Python 3.6 they will be word codes not bytecodes.

So here is what you'd have to do for your example:

a = """
def fact():
    a = 8
    a = 0
    return a
"""
c = compile(a, '<string>', 'exec')
fn_code = c.co_consts[0] # Pick up the function code from the main code
from dis import dis
dis(fn_code)
print("=" * 30)

x = fn_code.co_code[6:16] # modify bytecode

import types
opt_fn_code = types.CodeType(fn_code.co_argcount,
                             # c.co_kwonlyargcount,  Add this in Python3
                             # c.co_posonlyargcount, Add this in Python 3.8+
                             fn_code.co_nlocals,
                             fn_code.co_stacksize,
                             fn_code.co_flags,
                             x,  # fn_code.co_code: this you changed
                             fn_code.co_consts,
                             fn_code.co_names,
                             fn_code.co_varnames,
                             fn_code.co_filename,
                             fn_code.co_name,
                             fn_code.co_firstlineno,
                             fn_code.co_lnotab,   # In general, You should adjust this
                             fn_code.co_freevars,
                             fn_code.co_cellvars)
dis(opt_fn_code)
print("=" * 30)
print("Result is", eval(opt_fn_code))

# Now let's change the value of what's returned
co_consts = list(opt_fn_code.co_consts)
co_consts[-1] = 10

opt_fn_code = types.CodeType(fn_code.co_argcount,
                             # c.co_kwonlyargcount,  Add this in Python3
                             # c.co_posonlyargcount, Add this in Python 3.8+
                             fn_code.co_nlocals,
                             fn_code.co_stacksize,
                             fn_code.co_flags,
                             x,  # fn_code.co_code: this you changed
                             tuple(co_consts), # this is now changed too
                             fn_code.co_names,
                             fn_code.co_varnames,
                             fn_code.co_filename,
                             fn_code.co_name,
                             fn_code.co_firstlineno,
                             fn_code.co_lnotab,   # In general, You should adjust this
                             fn_code.co_freevars,
                             fn_code.co_cellvars)

dis(opt_fn_code)
print("=" * 30)
print("Result is now", eval(opt_fn_code))

When I ran this here is what I got:

  3           0 LOAD_CONST               1 (8)
              3 STORE_FAST               0 (a)

  4           6 LOAD_CONST               2 (0)
              9 STORE_FAST               0 (a)

  5          12 LOAD_FAST                0 (a)
             15 RETURN_VALUE
==============================
  3           0 LOAD_CONST               2 (0)
              3 STORE_FAST               0 (a)

  4           6 LOAD_FAST                0 (a)
              9 RETURN_VALUE
==============================
('Result is', 0)
  3           0 LOAD_CONST               2 (10)
              3 STORE_FAST               0 (a)

  4           6 LOAD_FAST                0 (a)
              9 RETURN_VALUE
==============================
('Result is now', 10)

Notice that the line numbers haven't changed even though I removed in code a couple of lines. That is because I didn't update fn_code.co_lnotab.

If you want to now write a Python bytecode file from this. Here is what you'd do:

co_consts = list(c.co_consts)
co_consts[0] = opt_fn_code
c1 = types.CodeType(c.co_argcount,
                    # c.co_posonlyargcount, Add this in Python 3.8+
                    # c.co_kwonlyargcount,  Add this in Python3
                    c.co_nlocals,
                    c.co_stacksize,
                    c.co_flags,
                    c.co_code,
                    tuple(co_consts),
                    c.co_names,
                    c.co_varnames,
                    c.co_filename,
                    c.co_name,
                    c.co_firstlineno,
                    c.co_lnotab,   # In general, You should adjust this
                    c.co_freevars,
                    c.co_cellvars)

from struct import pack
with open('/tmp/testing.pyc', 'w') as fp:
        fp.write(pack('Hcc', 62211, '\r', '\n')) # Python 2.7 magic number
        import time
        fp.write(pack('I', int(time.time())))
        # In Python 3.7+ you need to PEP 552 bits 
        # In Python 3 you need to write out the size mod 2**32 here
        import marshal
        fp.write(marshal.dumps(c1))

To simplify writing the boilerplate bytecode above, I've added a routine to xasm called write_pycfile().

Now to check the results:

$ uncompyle6 /tmp/testing.pyc
# uncompyle6 version 2.9.2
# Python bytecode 2.7 (62211)
# Disassembled from: Python 2.7.12 (default, Jul 26 2016, 22:53:31)
# [GCC 5.4.0 20160609]
# Embedded file name: <string>
# Compiled at: 2016-10-18 05:52:13


def fact():
    a = 0
# okay decompiling /tmp/testing.pyc
$ pydisasm /tmp/testing.pyc
# pydisasm version 3.1.0
# Python bytecode 2.7 (62211) disassembled from Python 2.7
# Timestamp in code: 2016-10-18 05:52:13
# Method Name:       <module>
# Filename:          <string>
# Argument count:    0
# Number of locals:  0
# Stack size:        1
# Flags:             0x00000040 (NOFREE)
# Constants:
#    0: <code object fact at 0x7f815843e4b0, file "<string>", line 2>
#    1: None
# Names:
#    0: fact
  2           0 LOAD_CONST               0 (<code object fact at 0x7f815843e4b0, file "<string>", line 2>)
              3 MAKE_FUNCTION            0
              6 STORE_NAME               0 (fact)
              9 LOAD_CONST               1 (None)
             12 RETURN_VALUE


# Method Name:       fact
# Filename:          <string>
# Argument count:    0
# Number of locals:  1
# Stack size:        1
# Flags:             0x00000043 (NOFREE | NEWLOCALS | OPTIMIZED)
# Constants:
#    0: None
#    1: 8
#    2: 10
# Local variables:
#    0: a
  3           0 LOAD_CONST               2 (10)
              3 STORE_FAST               0 (a)

  4           6 LOAD_CONST               0 (None)
              9 RETURN_VALUE
$

An alternate approach for optimization is to optimize at the Abstract Syntax Tree level (AST). The compile, eval and exec functions can start from an AST, or you can dump the AST. You could also write this back out as Python source using the Python module astor

Note however that some kinds of optimization like tail-recursion elimination might leave bytecode in a form that it can't be transformed in a truly faithful way to source code. See my pycon2018 Columbia Lightning Talk for a video I made which elminates tail recursion in bytecode to get an idea of what I'm talking about here.

这篇关于修改python字节码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆