numpy-将非连续数据转换为就地连续数据 [英] numpy -- Transform non-contiguous data to contiguous data in place

查看:297
本文介绍了numpy-将非连续数据转换为就地连续数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下代码:

import numpy as np
a = np.zeros(50)
a[10:20:2] = 1
b = c = a[10:40:4]
print b.flags  # You'll see that b and c are not C_CONTIGUOUS or F_CONTIGUOUS

我的问题:

是否有一种方法(仅引用b)使bc都连续? 如果np.may_share_memory(b,a)在此操作后返回False,就完全可以了.

Is there a way (with only a reference to b) to make both b and c contiguous? It is completely fine if np.may_share_memory(b,a) returns False after this operation.

接近但不太可行的是:np.ascontiguousarray/np.asfortranarray,因为它们将返回 new 数组.

Things which are close, but don't quite work out are: np.ascontiguousarray/np.asfortranarray as they will return a new array.

我的用例是,我在numpy.ndarray的子类中存储了非常大的3D字段.为了节省内存,我想将这些字段切分成我实际上对处理感兴趣的部分:

My use case is that I have very large 3D fields stored in a subclass of a numpy.ndarray. In order to save memory, I would like to chop those fields down to the portion of the domain that I am actually interested in processing:

a = a[ix1:ix2,iy1:iy2,iz1:iz2]

对子类的切片比对ndarray对象的切片有更多的限制,但这应该可行,并且将做正确的事"-子类上附加的各种自定义元数据将被转换/保留为预期的.不幸的是,由于这会返回view,因此numpy之后不会释放大数组,因此我实际上在这里没有保存任何内存.

Slicing for the subclass is somewhat more restricted than slicing of ndarray objects, but this should work and it will "do the right thing" -- the various custom meta-data attached on the subclass will be transformed/preserved as expected. Unfortunately, since this returns a view, numpy won't free the big array afterward so I don't actually save any memory here.

要完全清楚,我希望完成两件事:

To be completely clear, I'm looking to accomplish 2 things:

  • 在我的类实例上保留元数据.切片会起作用,但是我不确定其他形式的复制.
  • 进行设置,以使原始数组可以自由地被垃圾收集

推荐答案

您可以在cython中做到这一点:

You can do this in cython:

In [1]:
%load_ext cythonmagic

In [2]:
%%cython
cimport numpy as np

np.import_array()

def to_c_contiguous(np.ndarray a):
    cdef np.ndarray new
    cdef int dim, i
    new = a.copy()
    dim = np.PyArray_NDIM(new)
    for i in range(dim):
        np.PyArray_STRIDES(a)[i] = np.PyArray_STRIDES(new)[i]
    a.data = new.data
    np.PyArray_UpdateFlags(a, np.NPY_C_CONTIGUOUS)
    np.set_array_base(a, new)

In [8]:
import sys
import numpy as np
a = np.random.rand(10, 10, 10)
b = c = a[::2, 1::3, 2::4]
d = a[::2, 1::3, 2::4]
print sys.getrefcount(a)
to_c_contiguous(b)
print sys.getrefcount(a)
print np.all(b==d)

输出为:

4
3
True

to_c_contiguous(a)将创建a的c_contiguous副本,并将其作为a的基础.

to_c_contiguous(a) will create a c_contiguous copy of a, and make it as the base of a.

在调用to_c_contiguous(b)之后,a的引用计数减少,并且当a的引用计数变为0时,它将被释放.

After the call of to_c_contiguous(b), the refcount of a is decreased, and when the refcount of a become 0, it will be freed.

这篇关于numpy-将非连续数据转换为就地连续数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆