腌制numpy数组的子类时保留自定义属性 [英] Preserve custom attributes when pickling subclass of numpy array
问题描述
我已经在 numpy文档之后创建了numpy ndarray的子类.特别是,我添加了自定义属性,方法是修改提供的代码.
I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.
我正在使用Python multiprocessing
在并行循环中处理此类的实例.据我了解,将范围本质上复制"到多个线程的方法是使用pickle
.
I'm manipulating instances of this class within a parallel loop, using Python multiprocessing
. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle
.
我现在要面对的问题与numpy数组的腌制方式有关.我找不到关于此的任何全面文档,但是莳萝开发人员之间的一些讨论建议在__reduce__
方法上,该方法在酸洗时被调用.
The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__
method, which is being called upon pickling.
有人能对此进一步说明吗?最小的工作示例实际上只是我上面链接的numpy示例代码,为完整起见,在此处复制:
Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
现在这是问题所在:
import pickle
obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info # 'foo'
pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info # raises AttributeError
谢谢.
推荐答案
np.ndarray
使用__reduce__
来腌制自己.我们可以看看调用该函数时实际返回的内容,以了解发生了什么:
np.ndarray
uses __reduce__
to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
因此,我们得到了一个三元组. __reduce__
的文档描述了每个元素的作用:
So, we get a 3-tuple back. The docs for __reduce__
describe what each element is doing:
返回元组时,它必须在2到5个元素之间 长.可选元素可以省略,也可以不提供 作为他们的价值.该元组的内容正常腌制,然后 用于在去酸洗时重建对象.的语义 每个元素是:
When a tuple is returned, it must be between two and five elements long. Optional elements can either be omitted, or None can be provided as their value. The contents of this tuple are pickled as normal and used to reconstruct the object at unpickling time. The semantics of each element are:
-
将被调用以创建的初始版本的可调用对象 物体.元组的下一个元素将为 此可调用元素以及以后的元素提供其他状态信息 随后将用于完全重建腌制的数据.
A callable object that will be called to create the initial version of the object. The next element of the tuple will provide arguments for this callable, and later elements provide additional state information that will subsequently be used to fully reconstruct the pickled data.
在解开环境中,该对象必须是类,
可调用注册为安全构造函数"(请参见下文),否则必须
具有具有真实值的属性__safe_for_unpickling__
.
否则,UnpicklingError
将在解酸中产生
环境.请注意,像往常一样,可调用项本身是由
名称.
In the unpickling environment this object must be either a class, a
callable registered as a "safe constructor" (see below), or it must
have an attribute __safe_for_unpickling__
with a true value.
Otherwise, an UnpicklingError
will be raised in the unpickling
environment. Note that as usual, the callable itself is pickled by
name.
可调用对象的参数元组.
A tuple of arguments for the callable object.
(可选)对象的状态,该状态将传递给对象的状态
__setstate__()
方法,如酸洗和取消酸洗普通类实例"一节中所述.如果对象没有__setstate__()
方法,
然后,如上所述,该值必须是字典,并将其添加到
对象的__dict__
.
Optionally, the object’s state, which will be passed to the object’s
__setstate__()
method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__()
method,
then, as above, the value must be a dictionary and it will be added to
the object’s __dict__
.
因此,_reconstruct
是用来重建对象的函数,(<class 'pick.RealisticInfoArray'>, (0,), 'b')
是传递给该函数的参数,而(1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
是传递给类的__setstate__
.这给了我们机会.我们可以覆盖__reduce__
并将自己的元组提供给__setstate__
,然后另外覆盖__setstate__
,以在我们点刺时设置自定义属性.我们只需要确保保留父类需要的所有数据,并调用父类的__setstate__
:
So, _reconstruct
is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b')
are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
gets passed to the class' __setstate__
. This gives us an opportunity; we could override __reduce__
and provide our own tuple to __setstate__
, and then additionally override __setstate__
, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__
, too:
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__
new_state = pickled_state[2] + (self.info,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.info = state[-1] # Set the info attribute
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
用法:
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'
这篇关于腌制numpy数组的子类时保留自定义属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!