什么时候可以腌制Python对象 [英] When can a Python object be pickled

查看:123
本文介绍了什么时候可以腌制Python对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用multiprocessing模块在Python中进行大量并行处理.我知道某些对象可以是泡菜(因此在multi-p中作为参数传递)而其他对象则不能.例如

I'm doing a fair amount of parallel processing in Python using the multiprocessing module. I know certain objects CAN be pickle (thus passed as arguments in multi-p) and others can't. E.g.

class abc():
    pass

a=abc()
pickle.dumps(a)
'ccopy_reg\n_reconstructor\np1\n(c__main__\nabc\np2\nc__builtin__\nobject\np3\nNtRp4\n.'

但是我的代码中有一些较大的类(大约有十二种方法),并且会发生这种情况:

But I have some larger classes in my code (a dozen methods, or so), and this happens:

a=myBigClass()
pickle.dumps(a)
Traceback (innermost last):
 File "<stdin>", line 1, in <module>
 File "/usr/apps/Python279/python-2.7.9-rhel5-x86_64/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects

它不是文件对象,但是在其他时候,我还会收到其他消息,它们基本上说:我不能腌制它".

It's not a file object, but at other times, I'll get other messages that say basically: "I can't pickle this".

那是什么规则?字节数?层次深度?月相?

So what's the rule? Number of bytes? Depth of hierarchy? Phase of the moon?

推荐答案

我是dill的作者.在dill中有一个相当全面的清单,列出了哪些酱菜和哪些不包括.它可以在Python 2.5–3.4的每个版本上运行,并可以通过更改一个标志来调整使用dill的酱菜或使用pickle的酱菜.请参见此处

I'm the dill author. There's a fairly comprehensive list of what pickles and what doesn't as part of dill. It can be run per version of Python 2.5–3.4, and adjusted for what pickles with dill or what pickles with pickle by changing one flag. See here and here.

什么是泡菜规则的根源(不在我头上):

The root of the rules for what pickles is (off the top of my head):

  1. 您能否通过引用捕获对象的状态(即 __main__中定义的函数与导入的函数)? [然后,是]
  2. 对于给定的对象类型,是否存在通用的__getstate____setstate__规则? [然后,是]
  3. 它是否依赖于Frame对象(即依赖于GIL和全局执行堆栈)?现在,迭代器是一个例外,它通过在重定位时重播"迭代器来实现. [然后,没有]
  4. 对象实例是否指向错误的类路径(即由于在闭包中定义,在C绑定中或其他__init__路径操作中)? [然后,没有]
  5. Python认为这样做危险吗? [然后,没有]
  1. Can you capture the state of the object by reference (i.e. a function defined in __main__ versus an imported function)? [Then, yes]
  2. Does a generic __getstate__ and __setstate__ rule exist for the given object type? [Then, yes]
  3. Does it depend on a Frame object (i.e. rely on the GIL and global execution stack)? Iterators are now an exception to this, by "replaying" the iterator on unpickling. [Then, no]
  4. Does the object instance point to the wrong class path (i.e. due to being defined in a closure, in C-bindings, or other __init__ path manipulations)? [Then, no]
  5. Is it considered dangerous by Python to allow this? [Then, no]

因此,(5)现在不像以前那么流行了,但是在pickle语言中仍然具有一些持久的影响. dill在大多数情况下会删除(1),(2)和(5)–但仍然受(3)和(4)的影响.

So, (5) is less prevalent now than it used to be, but still has some lasting effects in the language for pickle. dill, for the most part, removes (1), (2), and (5) – but is still fairly effected by (3) and (4).

我可能会忘记别的东西,但我认为总体而言,这些是基本规则.

I might be forgetting something else, but I think in general those are the underlying rules.

某些模块,例如multiprocessing,会注册一些对其功能很重要的对象. dill使用该语言注册大多数对象.

Certain modules like multiprocessing register some objects that are important for their functioning. dill registers the majority of objects in the language.

multiprocessingdill分支是必需的,因为multiprocessing使用cPickle,而dill只能扩展纯Python酸洗注册表.如果有足够的耐心,可以遍历dill中的所有相关copy_reg函数,并将它们应用于cPickle模块,您将获得更多具有酱菜功能的multiprocessing.我找到了一种简单的方法(阅读:一种衬里),用于pickle而不是cPickle.

The dill fork of multiprocessing is required because multiprocessing uses cPickle, and dill can only augment the pure-Python pickling registry. You could, if you have the patience, go through all the relevant copy_reg functions in dill, and apply them to the cPickle module and you'd get a much more pickle-capable multiprocessing. I've found a simple (read: one liner) way to do this for pickle, but not cPickle.

这篇关于什么时候可以腌制Python对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆