numpy.as_strided的结果是否取决于输入dtype? [英] Will results of numpy.as_strided depend on input dtype?

查看:115
本文介绍了numpy.as_strided的结果是否取决于输入dtype?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

numpy.lib.stride_tricks.as_strided 取决于NumPy数组的dtype?

Will the results of numpy.lib.stride_tricks.as_strided depend on the dtype of the NumPy array?

此问题源自.strides的定义,即

遍历数组时要在每个维中步进的字节元组.

Tuple of bytes to step in each dimension when traversing an array.

在其他问题中使用以下功能.它需要一个1d或2d数组,并创建长度为window的重叠窗口.结果将比输入大一维.

Take the following function that I've used in other questions here. It takes a 1d or 2d array and creates overlapping windows of length window. The result will one dimension greater than the input.

def rwindows(a, window):
    if a.ndim == 1:
        a = a.reshape(-1, 1)
    shape = a.shape[0] - window + 1, window, a.shape[-1]
    strides = (a.strides[0],) + a.strides
    windows = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return np.squeeze(windows)

# examples
# rwindows(np.arange(5), window=2)
# rwindows(np.arange(20).reshape((5,4)), window=2)

由于步幅的定义,并且例如,由于否则dtype float32float64的等效数组将具有不同的步幅,这是否会炸毁我上面的rwindows函数?

Because of the definition of strides and because, for instance, otherwise equivalent arrays of dtype float32 and float64 will have different strides, will this ever blow up my rwindows function above?

我已经尝试进行测试,但是它以非穷尽的方式进行,并且正在寻找答案(1)解释功能doc的免责声明/警告是否与我在此处提出的要求有关,并且(2)解释了为什么或为什么不等价具有不同dtypes&大步前进会在上面产生不同的结果.

I've tried to test but it's been in a non-exhaustive way and am looking for an answer that (1) explains whether the disclaimer/warning from the function doc has anything to do with what I'm asking here and (2) explains why or why not otherwise-equivalent arrays with different dtypes & strides would yield different results in the above.

推荐答案

否,针对as_strided的警告是针对两个与数据大小无关的问题,而这两个问题更多地来自写入结果视图.

No, the warning for as_strided is for two issues not really related to the size of the data and more result from writing to the resulting view.

  1. 首先,没有保护措施来确保view = as_strided(a . . . ) 指向a中的内存.这就是为什么在调用as_strided之前需要进行大量准备工作的原因.如果您的算法关闭,则可以很容易地将view指向不在a中的内存,并且确实可以将其寻址为垃圾,其他变量或您的操作系统.如果随后写入该视图,则数据可能会丢失,放错位置或损坏. . .或使计算机崩溃.
  1. First, there is no protection to assure view = as_strided(a . . . ) only points to memory in a. This is why there is so much deliberate preparation work done before calling as_strided. If your algorithm is off, you can easily have your view point to memory that is not in a, and which may indeed be addressed to garbage, other variables, or your operating system. If you then write to that view, your data can be lost, misplaced, corrupted . . . or crash your computer.

对于您的特定示例,它的安全性在很大程度上取决于您所使用的输入.您已将stridesa.strides设置为动态.您可能想assert认为adtype不是像object那样怪异的东西.

For your specific example, how safe it is depends a lot on what inputs you're using. You've set strides with a.strides so that is dynamic. You may want to assert that the dtype of a isn't something weird like object.

如果您确定总是 具有大于window的2-d a,则算法可能会很好,但也可以确保.如果不是,则可能要确保as_strided输出可用于n-d a阵列.例如:

If you're sure that you will always have a 2-d a that is larger than window, you will probably be fine with your algorithm, but you can also assert that to make sure. If not, you may want to make sure that the as_strided output works for n-d a arrays. For instance:

shape = a.shape[0] - window + 1, window, a.shape[-1]

应该是

shape = (a.shape[0] - window + 1, window) + a.shape[1:]

以便接受n-d输入.就引用坏内存而言,可能永远不会成为问题,但是如果您有更大的尺寸,则当前的shape会引用a中的错误数据.

in order to accept n-d input. It would probably never be a problem as far as referencing bad memory, but the current shape would reference the wrong data in a if you had more dimensions.

  1. 第二,创建的视图多次引用相同的数据块.如果您随后对该视图进行并行写入(通过view = foobar( . . ., out = view)),则结果可以为
  1. Second, the view created references the same data blocks multiple times. If you then do a parallel write to that view (through view = foo or bar( . . ., out = view)), the results can be unpredictable and probably not what you expect.

也就是说,如果您担心问题并且不需要写入as_strided视图(因为对于大多数常用的卷积应用程序则不需要),则可以始终将其设置为,即使您的strides和/或shape不正确,这也可以防止这两个问题.

That said, if you are afraid of problems and don't need to write to the as_strided view (as you don't for most convolution applications where it is commonly used), you can always set it as writable = False, which will prevent both problems even if your strides and/or shape are incorrect.

正如@hpaulj所指出的那样,除了这两个问题外,如果您对制作副本的view做一些操作(例如.flatten()或为大块内容建立索引)它)可能会导致MemoryError.

As pointed out by @hpaulj, in addition to those two problems, if you do something to a view that makes a copy (like .flatten() or fancy indexing a large chunk of it), it can cause a MemoryError.

这篇关于numpy.as_strided的结果是否取决于输入dtype?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆