腌制时保留numpy视图 [英] Preserving numpy view when pickling

查看:119
本文介绍了腌制时保留numpy视图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

默认情况下,即使对数组基础也进行了腌制,腌制一个numpy视图数组也会失去视图关系.我的情况是,我有一些腌制的复杂容器对象.在某些情况下,某些包含的数据是另一些视图.保存每个视图的独立数组不仅会浪费空间,而且,重新加载的数据也会丢失视图关系.

By default, pickling a numpy view array loses the view relationship, even if the array base is pickled too. My situation is that I have some complex container objects which are pickled. And in some cases, some contained data are views in some others. Saving a independent array of each view is not only a loss of space but also, the reloaded data have lost the view relationship.

一个简单的例子是(但在我的情况下,容器比字典还复杂):

A simple example would be (but in my case the container are more complex than a dictionary):

import numpy as np
import cPickle

tmp = np.zeros(2)
d1 = dict(a=tmp,b=tmp[:])    # d1 to be saved: b is a view on a

pickled = cPickle.dumps(d1)
d2 = cPickle.loads(pickled)  # d2 reloaded copy of d1 container

print 'd1 before:', d1
d1['b'][:] = 1
print 'd1 after: ', d1

print 'd2 before:', d2
d2['b'][:] = 1
print 'd2 after: ', d2

将打印:

d1 before: {'a': array([ 0.,  0.]), 'b': array([ 0.,  0.])}
d1 after:  {'a': array([ 1.,  1.]), 'b': array([ 1.,  1.])}
d2 before: {'a': array([ 0.,  0.]), 'b': array([ 0.,  0.])}
d2 after:  {'a': array([ 0.,  0.]), 'b': array([ 1.,  1.])} # not a view anymore

我的问题:

(1)有没有办法保存它? (2)(甚至更好)是否只有在腌制了基料的情况下才可以这样做

(1) Is there a way to preserve it? (2) (even better) is there a way to do it only if the base is pickled

对于(1),我认为可以通过更改视图数组的__setstate____reduce_ex_等来实现某种方式.但是我暂时对这些没有信心.对于(2),我一无所知.

For the (1) I think there may be some way by changing the __setstate__, __reduce_ex_, etc... of the view array. But I don't fill confident with these for now. For the (2) I have no idea.

推荐答案

这不是在NumPy中正确完成的,因为对基数组进行腌制并不总是很有意义,而且pickle不能公开检查是否另一个对象也作为其API的一部分被腌制.

This isn't done in NumPy proper, because it doesn't always make sense to pickle the base array, and pickle does not expose the ability to check if another object is also being pickled as part of its API.

但是这种检查可以在NumPy数组的自定义容器中完成.例如:

But this sort of check can be done in a custom container for NumPy arrays. For example:

import numpy as np
import pickle

def byte_offset(array, source):
    return array.__array_interface__['data'][0] - np.byte_bounds(source)[0]

class SharedPickleList(object):
    def __init__(self, arrays):
        self.arrays = list(arrays)

    def __getstate__(self):
        unique_ids = {id(array) for array in self.arrays}
        source_arrays = {}
        view_tuples = {}
        for array in self.arrays:
            if array.base is None or id(array.base) not in unique_ids:
                # only use views if the base is also being pickled
                source_arrays[id(array)] = array
            else:
                view_tuples[id(array)] = (array.shape,
                                          array.dtype,
                                          id(array.base),
                                          byte_offset(array, array.base),
                                          array.strides)
        order = [id(array) for array in self.arrays]
        return (source_arrays, view_tuples, order)

    def __setstate__(self, state):
        source_arrays, view_tuples, order = state
        view_arrays = {}
        for k, view_state in view_tuples.items():
            (shape, dtype, source_id, offset, strides) = view_state
            buffer = source_arrays[source_id].data
            array = np.ndarray(shape, dtype, buffer, offset, strides)
            view_arrays[k] = array
        self.arrays = [source_arrays[i]
                       if i in source_arrays
                       else view_arrays[i]
                       for i in order]

# unit tests
def check_roundtrip(arrays):
    unpickled_arrays = pickle.loads(pickle.dumps(
        SharedPickleList(arrays))).arrays
    assert all(a.shape == b.shape and (a == b).all()
               for a, b in zip(arrays, unpickled_arrays))

indexers = [0, None, slice(None), slice(2), slice(None, -1),
            slice(None, None, -1), slice(None, 6, 2)]

source0 = np.random.randint(100, size=10)
arrays0 = [np.asarray(source0[k1]) for k1 in indexers]
check_roundtrip([source0] + arrays0)

source1 = np.random.randint(100, size=(8, 10))
arrays1 = [np.asarray(source1[k1, k2]) for k1 in indexers for k2 in indexers]
check_roundtrip([source1] + arrays1)

这可以节省大量空间:

source = np.random.rand(1000)
arrays = [source] + [source[n:] for n in range(99)]
print(len(pickle.dumps(arrays, protocol=-1)))
# 766372
print(len(pickle.dumps(SharedPickleList(arrays), protocol=-1)))
# 11833

这篇关于腌制时保留numpy视图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆