我应该如何理解dis.dis的输出? [英] How should I understand the output of dis.dis?

查看:129
本文介绍了我应该如何理解dis.dis的输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解如何使用dis(Python字节码的反汇编程序).具体来说,应该如何解释 dis.dis 的输出(或 dis.disassemble)?

.

这是一个非常具体的例子(在 Python 2.7.3 中):

dis.dis("heapq.nsmallest(d,3)")0 BUILD_SET 249333 JUMP_IF_TRUE_OR_POP 118896 JUMP_FORWARD 28019(至 28028)9 STORE_GLOBAL 27756 (27756)12 LOAD_NAME 29811 (29811)15 STORE_SLICE+016 LOAD_CONST 13100 (13100)19 STORE_SLICE+1

我看到 JUMP_IF_TRUE_OR_POP 等. 是字节码指令(虽然有趣的是,BUILD_SET 没有出现在这个列表中,但我希望它可以作为 BUILD_TUPLE).我认为右边的数字是内存分配,左边的数字是goto 数字...我注意到它们几乎每次增加 3(但不完全).

如果我将 dis.dis("heapq.nsmallest(d,3)") 包裹在一个函数中:

def f_heapq_nsmallest(d,n):返回 heapq.nsmallest(d,n)dis.dis("f_heapq(d,3)")0 BUILD_TUPLE 267193 LOAD_NAME 28769 (28769)6 JUMP_ABSOLUTE 256409 44#什么是<44>?10 DELETE_SLICE+111 STORE_SLICE+1

解决方案

您正在尝试反汇编包含源代码的字符串,但 Python 2 中的 dis.dis 不支持该字符串.使用字符串参数,它将字符串视为包含字节码(请参阅函数 disassemble_stringdis.py 中).所以你看到的是基于将源代码误解为字节码的无意义输出.

Python 3 的情况有所不同,其中 dis.dis 在反汇编之前编译一个字符串参数:

Python 3.2.3(默认,2012 年 8 月 13 日,22:28:10)>>>导入文件>>>dis.dis('heapq.nlargest(d,3)')1 0 LOAD_NAME 0 (heapq)3 LOAD_ATTR 1(最大)6 LOAD_NAME 2 (d)9 LOAD_CONST 0 (3)12 CALL_FUNCTION 215 RETURN_VALUE

在 Python 2 中,您需要自己编译代码,然后再将其传递给 dis.dis:

Python 2.7.3(默认,2012 年 8 月 13 日,18:25:43)>>>导入文件>>>dis.dis(compile('heapq.nlargest(d,3)', '', 'eval'))1 0 LOAD_NAME 0 (heapq)3 LOAD_ATTR 1(最大)6 LOAD_NAME 2 (d)9 LOAD_CONST 0 (3)12 CALL_FUNCTION 215 RETURN_VALUE

数字是什么意思?最左边的数字 1 是源代码中编译此字节码的行号.左边一列的数字是指令在字节码中的偏移量,右边的数字是opargs.让我们看看实际的字节码:

<预><代码>>>>co = compile('heapq.nlargest(d,3)', '', 'eval')>>>co.co_code.encode('hex')'6500006a010065020064000083020053'

在字节码的偏移量 0 处,我们找到 65LOAD_NAME 的操作码,带有操作参数 0000;然后(在偏移 3 处)6a 是操作码 LOAD_ATTR0100 是 oparg,依此类推.请注意,opargs 以小端顺序排列,因此 0100 是数字 1.未记录的 opcode 模块包含表 opname 为您提供每个操作码的名称,并且 opmap 为您提供每个名称的操作码:

<预><代码>>>>opcode.opname[0x65]'LOAD_NAME'

oparg 的含义取决于操作码,完整的故事你需要阅读 CPython 虚拟机的实现 ceval.c 中.对于 LOAD_NAMELOAD_ATTR,oparg 是代码对象的 co_names 属性的索引:

<预><代码>>>>co.co_names('heapq', 'nlargest', 'd')

对于 LOAD_CONST,它是代码对象的 co_consts 属性的索引:

<预><代码>>>>co.co_consts(3,)

对于CALL_FUNCTION,是传递给函数的参数个数,16位编码,低字节为普通参数个数,高字节为关键字参数个数.

I would like to understand how to use dis (the dissembler of Python bytecode). Specifically, how should one interpret the output of dis.dis (or dis.disassemble)?

.

Here is a very specific example (in Python 2.7.3):

dis.dis("heapq.nsmallest(d,3)")

      0 BUILD_SET             24933
      3 JUMP_IF_TRUE_OR_POP   11889
      6 JUMP_FORWARD          28019 (to 28028)
      9 STORE_GLOBAL          27756 (27756)
     12 LOAD_NAME             29811 (29811)
     15 STORE_SLICE+0  
     16 LOAD_CONST            13100 (13100)
     19 STORE_SLICE+1

I see that JUMP_IF_TRUE_OR_POP etc. are bytecode instructions (although interestingly, BUILD_SET does not appear in this list, though I expect it works as BUILD_TUPLE). I think the numbers on the right-hand-side are memory allocations, and the numbers on the left are goto numbers... I notice they almost increment by 3 each time (but not quite).

If I wrap dis.dis("heapq.nsmallest(d,3)") inside a function:

def f_heapq_nsmallest(d,n):
    return heapq.nsmallest(d,n)

dis.dis("f_heapq(d,3)")

      0 BUILD_TUPLE            26719
      3 LOAD_NAME              28769 (28769)
      6 JUMP_ABSOLUTE          25640
      9 <44>                                      # what is <44> ?  
     10 DELETE_SLICE+1 
     11 STORE_SLICE+1 

解决方案

You are trying to disassemble a string containing source code, but that's not supported by dis.dis in Python 2. With a string argument, it treats the string as if it contained byte code (see the function disassemble_string in dis.py). So you are seeing nonsensical output based on misinterpreting source code as byte code.

Things are different in Python 3, where dis.dis compiles a string argument before disassembling it:

Python 3.2.3 (default, Aug 13 2012, 22:28:10) 
>>> import dis
>>> dis.dis('heapq.nlargest(d,3)')
  1           0 LOAD_NAME                0 (heapq) 
              3 LOAD_ATTR                1 (nlargest) 
              6 LOAD_NAME                2 (d) 
              9 LOAD_CONST               0 (3) 
             12 CALL_FUNCTION            2 
             15 RETURN_VALUE         

In Python 2 you need to compile the code yourself before passing it to dis.dis:

Python 2.7.3 (default, Aug 13 2012, 18:25:43) 
>>> import dis
>>> dis.dis(compile('heapq.nlargest(d,3)', '<none>', 'eval'))
  1           0 LOAD_NAME                0 (heapq)
              3 LOAD_ATTR                1 (nlargest)
              6 LOAD_NAME                2 (d)
              9 LOAD_CONST               0 (3)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE        

What do the numbers mean? The number 1 on the far left is the line number in the source code from which this byte code was compiled. The numbers in the column on the left are the offset of the instruction within the bytecode, and the numbers on the right are the opargs. Let's look at the actual byte code:

>>> co = compile('heapq.nlargest(d,3)', '<none>', 'eval')
>>> co.co_code.encode('hex')
'6500006a010065020064000083020053'

At offset 0 in the byte code we find 65, the opcode for LOAD_NAME, with the oparg 0000; then (at offset 3) 6a is the opcode LOAD_ATTR, with 0100 the oparg, and so on. Note that the opargs are in little-endian order, so that 0100 is the number 1. The undocumented opcode module contains tables opname giving you the name for each opcode, and opmap giving you the opcode for each name:

>>> opcode.opname[0x65]
'LOAD_NAME'

The meaning of the oparg depends on the opcode, and for the full story you need to read the implementation of the CPython virtual machine in ceval.c. For LOAD_NAME and LOAD_ATTR the oparg is an index into the co_names property of the code object:

>>> co.co_names
('heapq', 'nlargest', 'd')

For LOAD_CONST it is an index into the co_consts property of the code object:

>>> co.co_consts
(3,)

For CALL_FUNCTION, it is the number of arguments to pass to the function, encoded in 16 bits with the number of ordinary arguments in the low byte, and the number of keyword arguments in the high byte.

这篇关于我应该如何理解dis.dis的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆