numpy.as_strided的结果是否取决于输入dtype? [英] Will results of numpy.as_strided depend on input dtype?
问题描述
将 numpy.lib.stride_tricks.as_strided
取决于NumPy数组的dtype?
Will the results of numpy.lib.stride_tricks.as_strided
depend on the dtype of the NumPy array?
此问题源自.strides
的定义,即
遍历数组时要在每个维中步进的字节元组.
Tuple of bytes to step in each dimension when traversing an array.
在其他问题中使用以下功能.它需要一个1d或2d数组,并创建长度为window
的重叠窗口.结果将比输入大一维.
Take the following function that I've used in other questions here. It takes a 1d or 2d array and creates overlapping windows of length window
. The result will one dimension greater than the input.
def rwindows(a, window):
if a.ndim == 1:
a = a.reshape(-1, 1)
shape = a.shape[0] - window + 1, window, a.shape[-1]
strides = (a.strides[0],) + a.strides
windows = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
return np.squeeze(windows)
# examples
# rwindows(np.arange(5), window=2)
# rwindows(np.arange(20).reshape((5,4)), window=2)
由于步幅的定义,并且例如,由于否则dtype float32
和float64
的等效数组将具有不同的步幅,这是否会炸毁我上面的rwindows
函数?
Because of the definition of strides and because, for instance, otherwise equivalent arrays of dtype float32
and float64
will have different strides, will this ever blow up my rwindows
function above?
我已经尝试进行测试,但是它以非穷尽的方式进行,并且正在寻找答案(1)解释功能doc的免责声明/警告是否与我在此处提出的要求有关,并且(2)解释了为什么或为什么不等价具有不同dtypes&大步前进会在上面产生不同的结果.
I've tried to test but it's been in a non-exhaustive way and am looking for an answer that (1) explains whether the disclaimer/warning from the function doc has anything to do with what I'm asking here and (2) explains why or why not otherwise-equivalent arrays with different dtypes & strides would yield different results in the above.
推荐答案
否,针对as_strided
的警告是针对两个与数据大小无关的问题,而这两个问题更多地来自写入结果视图.
No, the warning for as_strided
is for two issues not really related to the size of the data and more result from writing to the resulting view.
- 首先,没有保护措施来确保
view = as_strided(a . . . )
仅指向a
中的内存.这就是为什么在调用as_strided
之前需要进行大量准备工作的原因.如果您的算法关闭,则可以很容易地将view
指向不在a
中的内存,并且确实可以将其寻址为垃圾,其他变量或您的操作系统.如果随后写入该视图,则数据可能会丢失,放错位置或损坏. . .或使计算机崩溃.
- First, there is no protection to assure
view = as_strided(a . . . )
only points to memory ina
. This is why there is so much deliberate preparation work done before callingas_strided
. If your algorithm is off, you can easily have yourview
point to memory that is not ina
, and which may indeed be addressed to garbage, other variables, or your operating system. If you then write to that view, your data can be lost, misplaced, corrupted . . . or crash your computer.
对于您的特定示例,它的安全性在很大程度上取决于您所使用的输入.您已将strides
和a.strides
设置为动态.您可能想assert
认为a
的dtype
不是像object
那样怪异的东西.
For your specific example, how safe it is depends a lot on what inputs you're using. You've set strides
with a.strides
so that is dynamic. You may want to assert
that the dtype
of a
isn't something weird like object
.
如果您确定总是 具有大于window
的2-d a
,则算法可能会很好,但也可以as_strided
输出可用于n-d a
阵列.例如:
If you're sure that you will always have a 2-d a
that is larger than window
, you will probably be fine with your algorithm, but you can also assert
that to make sure. If not, you may want to make sure that the as_strided
output works for n-d a
arrays. For instance:
shape = a.shape[0] - window + 1, window, a.shape[-1]
应该是
shape = (a.shape[0] - window + 1, window) + a.shape[1:]
以便接受n-d输入.就引用坏内存而言,可能永远不会成为问题,但是如果您有更大的尺寸,则当前的shape
会引用a
中的错误数据.
in order to accept n-d input. It would probably never be a problem as far as referencing bad memory, but the current shape
would reference the wrong data in a
if you had more dimensions.
- Second, the view created references the same data blocks multiple times. If you then do a parallel write to that view (through
view = foo
orbar( . . ., out = view)
), the results can be unpredictable and probably not what you expect.
也就是说,如果您担心问题并且不需要写入as_strided
视图(因为对于大多数常用的卷积应用程序则不需要),则可以始终将其设置为strides
和/或shape
不正确,这也可以防止这两个问题.
That said, if you are afraid of problems and don't need to write to the as_strided
view (as you don't for most convolution applications where it is commonly used), you can always set it as writable = False
, which will prevent both problems even if your strides
and/or shape
are incorrect.
正如@hpaulj所指出的那样,除了这两个问题外,如果您对制作副本的view
做一些操作(例如.flatten()
或为大块内容建立索引)它)可能会导致MemoryError
.
As pointed out by @hpaulj, in addition to those two problems, if you do something to a view
that makes a copy (like .flatten()
or fancy indexing a large chunk of it), it can cause a MemoryError
.
这篇关于numpy.as_strided的结果是否取决于输入dtype?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!