腌制numpy数组的子类时保留自定义属性 [英] Preserve custom attributes when pickling subclass of numpy array

查看:56
本文介绍了腌制numpy数组的子类时保留自定义属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在 numpy文档之后创建了numpy ndarray的子​​类.特别是,我添加了自定义属性,方法是修改提供的代码.

I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.

我正在使用Python multiprocessing在并行循环中处理此类的实例.据我了解,将范围本质上复制"到多个线程的方法是使用pickle.

I'm manipulating instances of this class within a parallel loop, using Python multiprocessing. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle.

我现在要面对的问题与numpy数组的腌制方式有关.我找不到关于此的任何全面文档,但是莳萝开发人员之间的一些讨论建议在__reduce__方法上,该方法在酸洗时被调用.

The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__ method, which is being called upon pickling.

有人能对此进一步说明吗?最小的工作示例实际上只是我上面链接的numpy示例代码,为完整起见,在此处复制:

Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:

import numpy as np

class RealisticInfoArray(np.ndarray):

    def __new__(cls, input_array, info=None):
        # Input array is an already formed ndarray instance
        # We first cast to be our class type
        obj = np.asarray(input_array).view(cls)
        # add the new attribute to the created instance
        obj.info = info
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        # see InfoArray.__array_finalize__ for comments
        if obj is None: return
        self.info = getattr(obj, 'info', None)

现在这是问题所在:

import pickle

obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info  # 'foo'

pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info  #  raises AttributeError

谢谢.

推荐答案

np.ndarray使用__reduce__来腌制自己.我们可以看看调用该函数时实际返回的内容,以了解发生了什么:

np.ndarray uses __reduce__ to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:

>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))

因此,我们得到了一个三元组. __reduce__ 的文档描述了每个元素的作用:

So, we get a 3-tuple back. The docs for __reduce__ describe what each element is doing:

返回元组时,它必须在2到5个元素之间 长.可选元素可以省略,也可以不提供 作为他们的价值.该元组的内容正常腌制,然后 用于在去酸洗时重建对象.的语义 每个元素是:

When a tuple is returned, it must be between two and five elements long. Optional elements can either be omitted, or None can be provided as their value. The contents of this tuple are pickled as normal and used to reconstruct the object at unpickling time. The semantics of each element are:

  • 将被调用以创建的初始版本的可调用对象 物体.元组的下一个元素将为 此可调用元素以及以后的元素提供其他状态信息 随后将用于完全重建腌制的数据.

  • A callable object that will be called to create the initial version of the object. The next element of the tuple will provide arguments for this callable, and later elements provide additional state information that will subsequently be used to fully reconstruct the pickled data.

在解开环境中,该对象必须是类, 可调用注册为安全构造函数"(请参见下文),否则必须 具有具有真实值的属性__safe_for_unpickling__. 否则,UnpicklingError将在解酸中产生 环境.请注意,像往常一样,可调用项本身是由 名称.

In the unpickling environment this object must be either a class, a callable registered as a "safe constructor" (see below), or it must have an attribute __safe_for_unpickling__ with a true value. Otherwise, an UnpicklingError will be raised in the unpickling environment. Note that as usual, the callable itself is pickled by name.

可调用对象的参数元组.

A tuple of arguments for the callable object.

(可选)对象的状态,该状态将传递给对象的状态 __setstate__()方法,如酸洗和取消酸洗普通类实例"一节中所述.如果对象没有__setstate__()方法, 然后,如上所述,该值必须是字典,并将其添加到 对象的__dict__.

Optionally, the object’s state, which will be passed to the object’s __setstate__() method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__() method, then, as above, the value must be a dictionary and it will be added to the object’s __dict__.

因此,_reconstruct是用来重建对象的函数,(<class 'pick.RealisticInfoArray'>, (0,), 'b')是传递给该函数的参数,而(1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))是传递给类的__setstate__.这给了我们机会.我们可以覆盖__reduce__并将自己的元组提供给__setstate__,然后另外覆盖__setstate__,以在我们点刺时设置自定义属性.我们只需要确保保留父类需要的所有数据,并调用父类的__setstate__:

So, _reconstruct is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b') are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) gets passed to the class' __setstate__. This gives us an opportunity; we could override __reduce__ and provide our own tuple to __setstate__, and then additionally override __setstate__, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__, too:

class RealisticInfoArray(np.ndarray):
    def __new__(cls, input_array, info=None):
        obj = np.asarray(input_array).view(cls)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.info = getattr(obj, 'info', None)

    def __reduce__(self):
        # Get the parent's __reduce__ tuple
        pickled_state = super(RealisticInfoArray, self).__reduce__()
        # Create our own tuple to pass to __setstate__
        new_state = pickled_state[2] + (self.info,)
        # Return a tuple that replaces the parent's __setstate__ tuple with our own
        return (pickled_state[0], pickled_state[1], new_state)

    def __setstate__(self, state):
        self.info = state[-1]  # Set the info attribute
        # Call the parent's __setstate__ with the other tuple elements.
        super(RealisticInfoArray, self).__setstate__(state[0:-1])

用法:

>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'

这篇关于腌制numpy数组的子类时保留自定义属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆