检查泡菜转储的依赖项 [英] Inspecting a pickle dump for dependencies

查看:49
本文介绍了检查泡菜转储的依赖项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我写了以下代码:

import pickle
def foo():
    return 'foo'

def bar():
    return 'bar' + foo()

pickle.dump(bar, open('bar.bin', 'wb'))

此时,我有一个二进制转储(当然没有 foo 对全局范围的依赖).现在,如果我运行以下行

At this point, I've a binary dump (of course without the dependency of foo from the global scope). Now, if I run the following line

temp = pickle.load(open('bar.bin', 'rb'))

我收到以下错误,阅读 这个.

I get the following error, and it makes perfect sense after having read this.

错误:AttributeError:无法在<模块'ma​​in'(内置)>

Error: AttributeError: Can't get attribute 'bar' on <module 'main' (built-in)>

这当然是一个最小的例子,但我很好奇是否有一种通用的方法可以检查正确解压泡菜转储所需的依赖项.一个简单的解决方案是处理属性错误(如上例所示),但我可以以编程方式完成吗?

This of course is a minimal example but I'm curious if there is a generalized way that I can inspect the dependencies needed to properly unpickle the pickle dump. A trivial solution would be to handle attribute errors (as in the above case), but can I do it programmatically?

推荐答案

您可以使用 pickletools 模块 生成反汇编操作流,这将让您收集有关pickle 数据需要访问哪些模块和名称的信息.我会使用 pickletools.genops() 函数在这里.

You can use the pickletools module to produce a stream of disassembled operations, which would let you collect information about what modules and names the pickled data would need to access. I'd use the pickletools.genops() function here.

现在,该模块针对在 pickle 库上工作的核心开发人员,因此关于它发出的操作码的文档只能在 模块源代码,许多都与协议的特定版本相关联,但GLOBAL>code> opcodes 是这里有趣的操作码.在GLOBAL的情况下,加载的名称是操作码参数,在其他情况下,您需要查看堆栈.然而,堆栈比推送和弹出操作稍微复杂一点,因为可变长度的项目(列表、字典等)使用标记对象来允许 unpickler 检测此类对象何时完成,并且有一个记忆功能,以避免重复命名流中的项目.

Now, the module is aimed at the core developers working on the pickle library, so documentation on the opcodes this emits is only found in the module source code, and many are tied to specific versions of the protocol, but the GLOBAL and STACK_GLOBAL opcodes are the interesting opcodes here. In the case of GLOBAL, the name loaded is the opcode argument, in the other case, you need to look at the stack. The stack is a little bit more complex than just push and pop operations however, as variable-length items (lists, dicts, etc.) use a marker object to allow the unpickler to detect when such an object has been completed, and there is a memoizing function to avoid having to repeatedly name items in the stream.

模块代码详细说明了堆栈、备忘录和各种操作码的工作方式,但如果您只需要知道引用了哪些名称,通常可以忽略其中的大部分内容.

The module code details how the stack, memo and various opcodes work, but you generally can ignore most of this if all you need is to know what names are referenced.

因此,对于您的流,并假设流总是格式良好dis() 函数的以下简化将允许您提取所有名称由 GLOBALSTACK_GLOBAL 操作码引用:

So for your stream, and making the assumption that the stream is always well-formed , the following simplification of the dis() function would let you extract all names referenced by GLOBAL and STACK_GLOBAL opcodes:

import pickletools

def get_names(stream):
    """Generates (module, qualname) tuples from a pickle stream"""

    stack, markstack, memo = [], [], []
    mo = pickletools.markobject

    for op, arg, pos in pickletools.genops(stream):
        # simulate the pickle stack and marking scheme, insofar
        # necessary to allow us to retrieve the names used by STACK_GLOBAL

        before, after = op.stack_before, op.stack_after
        numtopop = len(before)

        if op.name == "GLOBAL":
            yield tuple(arg.split(1, None))
        elif op.name == "STACK_GLOBAL":
            yield (stack[-2], stack[-1])

        elif mo in before or (op.name == "POP" and stack and stack[-1] is mo):
            markpos = markstack.pop()
            while stack[-1] is not mo:
                stack.pop()
            stack.pop()
            try:
                numtopop = before.index(mo)
            except ValueError:
                numtopop = 0
        elif op.name in {"PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"}:
            if op.name == "MEMOIZE":
                memo.append(stack[-1])
            else:
                memo[arg] = stack[-1]
            numtopop, after = 0, []  # memoize and put do not pop the stack
        elif op.name in {"GET", "BINGET", "LONG_BINGET"}:
            arg = memo[arg]
    
        if numtopop:
            del stack[-numtopop:]
        if mo in after:
            markstack.append(pos)
    
        if len(after) == 1 and op.arg is not None:
            stack.append(arg)
        else:
            stack.extend(after)

以及示例输入的简短演示:

And a short demo for your example input:

>>> pickled_bar = pickle.dumps(bar)
>>> for mod, qualname in get_names(pickled_bar):
...     print(f"module: {mod}, name: {qualname}")
...
module: __main__, name: bar

或者一个稍微复杂一点的例子,带有检查.Signature() 实例 相同:

or a slightly more involved example with a inspect.Signature() instance for the same:

>>> import inspect
>>> pickled_sig_set = pickle.dumps({inspect.signature(bar)})
>>> for mod, qualname in get_names(pickled_sig_set):
...     print(f"module: {mod}, name: {qualname}")
...
module: inspect, name: Signature
module: inspect, name: _empty

后者利用记忆重新使用 inspect.Signature.empty 引用的 inspect 名称,以及跟踪集合位置的标记元素开始.

The latter make use of the memoization to re-use the inspect name for the inspect.Signature.empty reference, as well as a marker to track where the set elements started.

这篇关于检查泡菜转储的依赖项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆