从一维数组构建高效的Numpy 2D数组 [英] Efficient Numpy 2D array construction from 1D array

查看:44
本文介绍了从一维数组构建高效的Numpy 2D数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数组:

I have an array like this:

A = array([1,2,3,4,5,6,7,8,9,10])

我正在尝试获得一个像这样的数组:

And I am trying to get an array like this:

B = array([[1,2,3],
          [2,3,4],
          [3,4,5],
          [4,5,6]])

每行(具有固定的任意宽度)移动一格. A的数组是10k记录长,我试图在Numpy中找到一种有效的方法.目前,我正在使用vstack和for循环,这很慢.有没有更快的方法?

Where each row (of a fixed arbitrary width) is shifted by one. The array of A is 10k records long and I'm trying to find an efficient way of doing this in Numpy. Currently I am using vstack and a for loop which is slow. Is there a faster way?

width = 3 # fixed arbitrary width
length = 10000 # length of A which I wish to use
B = A[0:length + 1]
for i in range (1, length):
    B = np.vstack((B, A[i, i + width + 1]))

推荐答案

实际上,还有一种更有效的方法...使用vstack等的缺点是,您正在复制大批.

Actually, there's an even more efficient way to do this... The downside to using vstack etc, is that you're making a copy of the array.

顺便说一句,这实际上与@Paul的答案相同,但是我发布这个只是为了更详细地解释事情...

Incidentally, this is effectively identical to @Paul's answer, but I'm posting this just to explain things in a bit more detail...

有一种方法可以只用视图来做到这一点,这样就不会复制没有内存.

There's a way to do this with just views so that no memory is duplicated.

我直接从 Erik Rigtorp在numpy上的帖子中借用了此邮件讨论,后者又从基思·古德曼(Keith Goodman)的 Bottleneck 借来的(这很有用!).

I'm directly borrowing this from Erik Rigtorp's post to numpy-discussion, who in turn, borrowed it from Keith Goodman's Bottleneck (Which is quite useful!).

基本技巧是直接操纵数组(对于一维数组):

The basic trick is to directly manipulate the strides of the array (For one-dimensional arrays):

import numpy as np

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)
print rolling(a, 3)

其中a是您的输入数组,而window是您想要的窗口的长度(在您的情况下为3).

Where a is your input array and window is the length of the window that you want (3, in your case).

这将产生:

[[0 1 2]
 [1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]
 [7 8 9]]

但是,原始a与返回的数组之间绝对没有重复的内存.这意味着它运行速度快,并且扩展性比其他选项好很多.

However, there is absolutely no duplication of memory between the original a and the returned array. This means that it's fast and scales much better than other options.

例如(使用a = np.arange(100000)window=3):

%timeit np.vstack([a[i:i-window] for i in xrange(window)]).T
1000 loops, best of 3: 256 us per loop

%timeit rolling(a, window)
100000 loops, best of 3: 12 us per loop

如果我们将其沿N维数组的最后一个轴泛化为滚动窗口",则会得到Erik Rigtorp的滚动窗口"功能:

If we generalize this to a "rolling window" along the last axis for an N-dimensional array, we get Erik Rigtorp's "rolling window" function:

import numpy as np

def rolling_window(a, window):
   """
   Make an ndarray with a rolling window of the last dimension

   Parameters
   ----------
   a : array_like
       Array to add rolling window to
   window : int
       Size of rolling window

   Returns
   -------
   Array that is a view of the original array with a added dimension
   of size w.

   Examples
   --------
   >>> x=np.arange(10).reshape((2,5))
   >>> rolling_window(x, 3)
   array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
          [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

   Calculate rolling mean of last dimension:
   >>> np.mean(rolling_window(x, 3), -1)
   array([[ 1.,  2.,  3.],
          [ 6.,  7.,  8.]])

   """
   if window < 1:
       raise ValueError, "`window` must be at least 1."
   if window > a.shape[-1]:
       raise ValueError, "`window` is too long."
   shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   strides = a.strides + (a.strides[-1],)
   return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

因此,让我们看一下这里发生了什么...操纵数组的strides似乎有些神奇,但是一旦您了解了正在发生的事情,那根本就没有了. numpy数组的步幅描述了沿给定轴递增一个值所必须执行的步长(以字节为单位).因此,对于64位浮点数的一维数组,每个项目的长度为8个字节,而x.strides(8,).

So, let's look into what's going on here... Manipulating an array's strides may seem a bit magical, but once you understand what's going on, it's not at all. The strides of a numpy array describe the size in bytes of the steps that must be taken to increment one value along a given axis. So, in the case of a 1-dimensional array of 64-bit floats, the length of each item is 8 bytes, and x.strides is (8,).

x = np.arange(9)
print x.strides

现在,如果我们将其重塑为2D,3x3数组,则步幅将为(3 * 8, 8),因为我们必须跳24个字节才能沿第一个轴增加一个步长,而要跳8个字节来沿第一个轴增加一个步长.第二个轴.

Now, if we reshape this into a 2D, 3x3 array, the strides will be (3 * 8, 8), as we would have to jump 24 bytes to increment one step along the first axis, and 8 bytes to increment one step along the second axis.

y = x.reshape(3,3)
print y.strides

类似地,转置与反转数组的步幅相同:

Similarly a transpose is the same as just reversing the strides of an array:

print y
y.strides = y.strides[::-1]
print y

很显然,阵列的步幅和阵列的形状紧密相连.如果我们更改其中一个,就必须相应地更改另一个,否则我们将无法获得对实际上保存数组值的内存缓冲区的有效描述.

Clearly, the strides of an array and the shape of an array are intimately linked. If we change one, we have to change the other accordingly, otherwise we won't have a valid description of the memory buffer that actually holds the values of the array.

因此,如果要同时更改数组的形状和大小,即使设置了新的步幅和幅度,也无法仅通过设置x.stridesx.shape来做到这一点.形状都兼容.

Therefore, if you want to change both the shape and size of an array simultaneously, you can't do it just by setting x.strides and x.shape, even if the new strides and shape are compatible.

这是numpy.lib.as_strided出现的地方.它实际上是一个非常简单的函数,它仅同时设置数组的步幅和形状.

That's where numpy.lib.as_strided comes in. It's actually a very simple function that just sets the strides and shape of an array simultaneously.

它会检查两者是否兼容,但不会检查旧的步幅和新形状是否兼容,如果您分别设置两者会发生这种情况. (它实际上是通过 numpy的__array_interface__ 完成的,任意类来将内存缓冲区描述为numpy数组.)

It checks that the two are compatible, but not that the old strides and new shape are compatible, as would happen if you set the two independently. (It actually does this through numpy's __array_interface__, which allows arbitrary classes to describe a memory buffer as a numpy array.)

因此,我们所做的全部工作是使它沿一个轴前进一个项目(在64位数组的情况下为8个字节),但也沿另一个轴前进一个字节(仅8个字节). .

So, all we've done is made it so that steps one item forward (8 bytes in the case of a 64-bit array) along one axis, but also only steps 8 bytes forward along the other axis.

换句话说,在窗口"大小为3的情况下,数组的形状为(whatever, 3),但是对于第二维而不是步进完整的3 * x.itemsize,它只能步进一个项目前进,有效地使新数组的行成为原始数组的移动窗口"视图.

In other words, in case of a "window" size of 3, the array has a shape of (whatever, 3), but instead of stepping a full 3 * x.itemsize for the second dimension, it only steps one item forward, effectively making the rows of new array a "moving window" view into the original array.

(这也意味着x.shape[0] * x.shape[1]与新数组的x.size不同.)

(This also means that x.shape[0] * x.shape[1] will not be the same as x.size for your new array.)

无论如何,希望这会使事情变得更加清晰.

At any rate, hopefully that makes things slightly clearer..

这篇关于从一维数组构建高效的Numpy 2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆