酸洗 numpy 数组的子类时保留自定义属性 [英] Preserve custom attributes when pickling subclass of numpy array

查看:37
本文介绍了酸洗 numpy 数组的子类时保留自定义属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照 numpy 文档创建了 numpy ndarray 的子类.特别是,我添加了 一个自定义属性,通过修改提供的代码.

我正在并行循环中操作此类的实例,使用 Python multiprocessing.据我了解,作用域本质上是复制"到多个线程的方式是使用 pickle.

我现在遇到的问题与 numpy 数组的腌制方式有关.我找不到任何关于此的全面文档,但一些莳萝开发人员之间的讨论表明我应该关注在 __reduce__ 方法上,该方法在酸洗时被调用.

有人能对此有更多的了解吗?最小的工作示例实际上只是我上面链接的 numpy 示例代码,为了完整起见,复制到此处:

将 numpy 导入为 np类 RealisticInfoArray(np.ndarray):def __new__(cls, input_array, info=None):# 输入数组是一个已经形成的ndarray实例# 我们首先强制转换为我们的类类型obj = np.asarray(input_array).view(cls)# 将新属性添加到创建的实例中obj.info = 信息# 最后,我们必须返回新创建的对象:返回对象def __array_finalize__(self, obj):# 见 InfoArray.__array_finalize__ 评论如果 obj 为 None:返回self.info = getattr(obj, 'info', None)

现在问题来了:

进口泡菜obj = RealisticInfoArray([1, 2, 3], info='foo')打印 obj.info #'foo'pickle_str = pickle.dumps(obj)new_obj = pickle.loads(pickle_str)打印 new_obj.info # 引发 AttributeError

谢谢.

解决方案

np.ndarray 使用 __reduce__ 来pickle 自身.当您调用该函数时,我们可以查看它实际返回的内容,以了解发生了什么:

<预><代码>>>>obj = RealisticInfoArray([1, 2, 3], info='foo')>>>obj.__reduce__()(<内置函数_reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))

所以,我们得到了一个三元组.__reduce__ 的文档描述了每个元素的作用:

<块引用>

当返回一个元组时,它必须在两到五个元素之间长.可以省略可选元素,也可以提供 None作为他们的价值.这个元组的内容被正常腌制,并且用于在解酸时重建对象.的语义每个元素是:

  • 将被调用以创建初始版本的可调用对象物体.元组的下一个元素将为这个可调用的元素和后面的元素提供了额外的状态信息随后将用于完全重建腌制数据.

    在 unpickling 环境中,这个对象必须是一个类,一个callable 注册为安全构造函数"(见下文),或者它必须有一个具有真值的属性 __safe_for_unpickling__.否则,将在 unpickling 中引发 UnpicklingError环境.请注意,像往常一样,可调用对象本身由名称.

  • 可调用对象的参数元组.

  • 可选地,对象的状态,它将被传递给对象的__setstate__() 方法,如 Pickling 和 unpickling 普通类实例部分所述.如果对象没有 __setstate__() 方法,然后,如上所述,该值必须是一个字典,它将被添加到对象的 __dict__.

所以,_reconstruct 是调用重建对象的函数,(, (0,), 'b')是传递给该函数的参数,以及 (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) 被传递给类' __setstate__.这给了我们一个机会;我们可以覆盖 __reduce__ 并将我们自己的元组提供给 __setstate__,然后额外覆盖 __setstate__,以在我们解压时设置我们的自定义属性.我们只需要确保保留父类需要的所有数据,并调用父类的__setstate__:

class RealisticInfoArray(np.ndarray):def __new__(cls, input_array, info=None):obj = np.asarray(input_array).view(cls)obj.info = 信息返回对象def __array_finalize__(self, obj):如果 obj 为 None:返回self.info = getattr(obj, 'info', None)def __reduce__(self):# 获取父对象的 __reduce__ 元组pickled_state = super(RealisticInfoArray, self).__reduce__()# 创建我们自己的元组传递给 __setstate__new_state = pickled_state[2] + (self.info,)# 返回一个用我们自己的元组替换父级的 __setstate__ 元组的元组返回(pickled_state[0]、pickled_state[1]、new_state)def __setstate__(self, state):self.info = state[-1] # 设置信息属性# 使用其他元组元素调用父级的 __setstate__.super(RealisticInfoArray, self).__setstate__(state[0:-1])

用法:

<预><代码>>>>obj = pick.RealisticInfoArray([1, 2, 3], info='foo')>>>pickle_str = pickle.dumps(obj)>>>pickle_str"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb.">>>new_obj = pickle.loads(pickle_str)>>>new_obj.info'富'

I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.

I'm manipulating instances of this class within a parallel loop, using Python multiprocessing. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle.

The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__ method, which is being called upon pickling.

Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:

import numpy as np

class RealisticInfoArray(np.ndarray):

    def __new__(cls, input_array, info=None):
        # Input array is an already formed ndarray instance
        # We first cast to be our class type
        obj = np.asarray(input_array).view(cls)
        # add the new attribute to the created instance
        obj.info = info
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        # see InfoArray.__array_finalize__ for comments
        if obj is None: return
        self.info = getattr(obj, 'info', None)

Now here is the problem:

import pickle

obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info  # 'foo'

pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info  #  raises AttributeError

Thanks.

解决方案

np.ndarray uses __reduce__ to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:

>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))

So, we get a 3-tuple back. The docs for __reduce__ describe what each element is doing:

When a tuple is returned, it must be between two and five elements long. Optional elements can either be omitted, or None can be provided as their value. The contents of this tuple are pickled as normal and used to reconstruct the object at unpickling time. The semantics of each element are:

  • A callable object that will be called to create the initial version of the object. The next element of the tuple will provide arguments for this callable, and later elements provide additional state information that will subsequently be used to fully reconstruct the pickled data.

    In the unpickling environment this object must be either a class, a callable registered as a "safe constructor" (see below), or it must have an attribute __safe_for_unpickling__ with a true value. Otherwise, an UnpicklingError will be raised in the unpickling environment. Note that as usual, the callable itself is pickled by name.

  • A tuple of arguments for the callable object.

  • Optionally, the object’s state, which will be passed to the object’s __setstate__() method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__() method, then, as above, the value must be a dictionary and it will be added to the object’s __dict__.

So, _reconstruct is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b') are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) gets passed to the class' __setstate__. This gives us an opportunity; we could override __reduce__ and provide our own tuple to __setstate__, and then additionally override __setstate__, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__, too:

class RealisticInfoArray(np.ndarray):
    def __new__(cls, input_array, info=None):
        obj = np.asarray(input_array).view(cls)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.info = getattr(obj, 'info', None)

    def __reduce__(self):
        # Get the parent's __reduce__ tuple
        pickled_state = super(RealisticInfoArray, self).__reduce__()
        # Create our own tuple to pass to __setstate__
        new_state = pickled_state[2] + (self.info,)
        # Return a tuple that replaces the parent's __setstate__ tuple with our own
        return (pickled_state[0], pickled_state[1], new_state)

    def __setstate__(self, state):
        self.info = state[-1]  # Set the info attribute
        # Call the parent's __setstate__ with the other tuple elements.
        super(RealisticInfoArray, self).__setstate__(state[0:-1])

Usage:

>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'

这篇关于酸洗 numpy 数组的子类时保留自定义属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆