使用reshape()时,numpy何时复制数组 [英] When will numpy copy the array when using reshape()

查看:114
本文介绍了使用reshape()时,numpy何时复制数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

numpy.reshape的文档中说:

如果可能的话,这将是一个新的视图对象;否则,它将是副本.请注意,不能保证返回数组的内存布局(C或Fortran连续).

This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

我的问题是,numpy何时会选择返回新视图,何时复制整个数组?是否有任何通用原则告诉人们reshape的行为,或者只是不可预测的?谢谢.

My question is, when will numpy chooses to return a new view, and when to copy the whole array? Is there any general principles telling people about the behavior of reshape, or it is just unpredictable? Thanks.

推荐答案

@mgillson找到的链接似乎是在解决我如何确定它是否复制了"而不是我如何预测它"的问题.或了解为什么要制作副本.至于测试,我喜欢使用A.__array_interfrace__.

The link that @mgillson found appears to address the question of 'how do I tell if it made a copy', but not 'how do I predict it' or understand why it made the copy. As for the test, I like to use A.__array_interfrace__.

如果您尝试将值分配给变形数组,并期望同时更改原始数组,则很可能会出现问题.而且很难找到问题所在的SO案例.

Most likely this would be a problem if you tried to assign values to the reshaped array, expecting to also change the original. And I'd be hard pressed to find a SO case where that was the issue.

复制重塑会比非复制重塑慢一点,但是我再也无法想到那种情况会导致整个代码变慢.如果您使用的数组太大,以至于最简单的操作都会产生内存错误,那么复制也可能是一个问题.

A copying reshape will be a bit slower than a noncopying one, but again I can't think of a case where that produced a slow down of the whole code. A copy could also be an issue if you are working with arrays so big that the simplest operation produces a memory error.

在重塑数据缓冲区中的值之后,它们需要保持连续的顺序,即"C"或"F".例如:

After reshaping the values in the data buffer need to be in a contiguous order, either 'C' or 'F'. For example:

In [403]: np.arange(12).reshape(3,4,order='C')
Out[403]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [404]: np.arange(12).reshape(3,4,order='F')
Out[404]: 
array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

如果初始订单被弄乱了"以致它不能返回这样的值,它将进行复制.转置后重塑可能会做到这一点(请参见下面的示例). stride_tricks.as_strided游戏也可能如此.这些是我唯一能想到的情况.

It will do a copy if the initial order is so 'messed up' that it can't return values like this. Reshape after transpose may do this (see my example below). So might games with stride_tricks.as_strided. Off hand those are the only cases I can think of.

In [405]: x=np.arange(12).reshape(3,4,order='C')

In [406]: y=x.T

In [407]: x.__array_interface__
Out[407]: 
{'version': 3,
 'descr': [('', '<i4')],
 'strides': None,
 'typestr': '<i4',
 'shape': (3, 4),
 'data': (175066576, False)}

In [408]: y.__array_interface__
Out[408]: 
{'version': 3,
 'descr': [('', '<i4')],
 'strides': (4, 16),
 'typestr': '<i4',
 'shape': (4, 3),
 'data': (175066576, False)}

y(换位)具有相同的数据"指针.转置是在不更改或复制数据的情况下执行的,它只是使用新的shapestridesflags创建了一个新对象.

y, the transpose, has the same 'data' pointer. The transpose was performed without changing or copying the data, it just created a new object with new shape, strides, and flags.

In [409]: y.flags
Out[409]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  ...

In [410]: x.flags
Out[410]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  ...

y是订单'F'.现在尝试重塑它

y is order 'F'. Now try reshaping it

In [411]: y.shape
Out[411]: (4, 3)

In [412]: z=y.reshape(3,4)

In [413]: z.__array_interface__
Out[413]: 
{...
 'shape': (3, 4),
 'data': (176079064, False)}

In [414]: z
Out[414]: 
array([[ 0,  4,  8,  1],
       [ 5,  9,  2,  6],
       [10,  3,  7, 11]])

z是副本,其data缓冲区指针不同.其值的排列方式与xy相似,没有与0,1,2,...相似.

z is a copy, its data buffer pointer is different. Its values are not arranged in any way that resembles that of x or y, no 0,1,2,....

但仅重塑x不会产生副本:

But simply reshaping x does not produce a copy:

In [416]: w=x.reshape(4,3)

In [417]: w
Out[417]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [418]: w.__array_interface__
Out[418]: 
{...
 'shape': (4, 3),
 'data': (175066576, False)}

划定yy.reshape(-1)相同;它产生为副本:

Raveling y is the same as y.reshape(-1); it produces as copy:

In [425]: y.reshape(-1)
Out[425]: array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])

In [426]: y.ravel().__array_interface__['data']
Out[426]: (175352024, False)

像这样将值分配给乱序数组可能是副本将产生错误的最可能情况.例如,x.ravel()[::2]=99每隔xy个值更改(分别为列和行).但是y.ravel()[::2]=0由于此复制没有执行任何操作.

Assigning values to a raveled array like this may be the most likely case where a copy will produce an error. For example, x.ravel()[::2]=99 changes every other value of x and y (columns and rows respectively). But y.ravel()[::2]=0 does nothing because of this copying.

因此,转置后重塑是最可能的复制方案.我很乐于探索其他可能性.

So reshape after transpose is the most likely copy scenario. I'd be happy explore other possibilities.

编辑:y.reshape(-1,order='F')[::2]=0确实更改了y的值.使用兼容的顺序,重塑不会产生副本.

edit: y.reshape(-1,order='F')[::2]=0 does change the values of y. With a compatible order, reshape does not produce a copy.

@mgillson的链接中的一个答案 https://stackoverflow.com/a/14271298/901925 指出A.shape=...语法可防止复制.如果不复制就无法更改形状,则会引发错误:

One answer in @mgillson's link, https://stackoverflow.com/a/14271298/901925, points out that the A.shape=... syntax prevents copying. If it can't change the shape without copying it will raise an error:

In [441]: y.shape=(3,4)
...
AttributeError: incompatible shape for a non-contiguous array

reshape文档中也提到了这一点

This is also mentioned in the reshape documentation

如果您希望在复制数据时引发错误, 您应该将新形状分配给数组的shape属性:

If you want an error to be raise if the data is copied, you should assign the new shape to the shape attribute of the array::


关于按照as_strided重塑的问题:


SO question about reshape following as_strided:

重塑n维数组的视图无需重塑

无副本的Numpy视图重塑(二维移动/滑动窗口,步幅,蒙版内存结构)

=========================

==========================

这是我将shape.c/_attempt_nocopy_reshape转换成Python的第一步.可以使用以下命令运行该命令:

Here's my first cut at translating shape.c/_attempt_nocopy_reshape into Python. It can be run with something like:

newstrides = attempt_reshape(numpy.zeros((3,4)), (4,3), False)


import numpy   # there's an np variable in the code
def attempt_reshape(self, newdims, is_f_order):
    newnd = len(newdims)
    newstrides = numpy.zeros(newnd+1).tolist()  # +1 is a fudge

    self = numpy.squeeze(self)
    olddims = self.shape
    oldnd = self.ndim
    oldstrides = self.strides

    #/* oi to oj and ni to nj give the axis ranges currently worked with */

    oi,oj = 0,1
    ni,nj = 0,1
    while (ni < newnd) and (oi < oldnd):
        print(oi, ni)
        np = newdims[ni];
        op = olddims[oi];

        while (np != op):
            if (np < op):
                # /* Misses trailing 1s, these are handled later */
                np *= newdims[nj];
                nj += 1
            else:
                op *= olddims[oj];
                oj += 1

        print(ni,oi,np,op,nj,oj)

        #/* Check whether the original axes can be combined */
        for ok in range(oi, oj-1):
            if (is_f_order) :
                if (oldstrides[ok+1] != olddims[ok]*oldstrides[ok]):
                    # /* not contiguous enough */
                    return 0;
            else:
                #/* C order */
                if (oldstrides[ok] != olddims[ok+1]*oldstrides[ok+1]) :
                    #/* not contiguous enough */
                    return 0;

        # /* Calculate new strides for all axes currently worked with */
        if (is_f_order) :
            newstrides[ni] = oldstrides[oi];
            for nk in range(ni+1,nj):
                newstrides[nk] = newstrides[nk - 1]*newdims[nk - 1];
        else:
            #/* C order */
            newstrides[nj - 1] = oldstrides[oj - 1];
            #for (nk = nj - 1; nk > ni; nk--) {
            for nk in range(nj-1, ni, -1):
                newstrides[nk - 1] = newstrides[nk]*newdims[nk];
        nj += 1; ni = nj
        oj += 1; oi = oj  
        print(olddims, newdims)  
        print(oldstrides, newstrides)

    # * Set strides corresponding to trailing 1s of the new shape.
    if (ni >= 1) :
        print(newstrides, ni)
        last_stride = newstrides[ni - 1];
    else :
        last_stride = self.itemsize # PyArray_ITEMSIZE(self);

    if (is_f_order) :
        last_stride *= newdims[ni - 1];

    for nk in range(ni, newnd):
        newstrides[nk] = last_stride;
    return newstrides

这篇关于使用reshape()时,numpy何时复制数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆