重复的numpy子数组 [英] repeated numpy subarrays
问题描述
这是我的问题的简化.我有一个numpy数组:
This is a simplification of my question. I have a numpy array:
x = np.array([0,1,2,3])
我有一个功能:
def f(y): return y**2
我可以计算f(x).
现在假设我真的想为重复的x计算f(x):
Now suppose I really want to compute f(x) for a repeated x:
x = np.array([0,1,2,3,0,1,2,3,0,1,2,3])
有没有一种方法可以在不创建x重复版本的情况下,并且对f透明?
Is there a way to do this without creating a repeated version of x and in a way that is transparent to f?
在我的特殊情况下,f是一个涉及的函数,参数之一是x.我希望能够在重复x时计算出f,而不必实际重复它,因为它不适合内存.
In my particular case, f is an involved function and one of the arguments is x. I would like to be able to calculate f when x is repeated without actually repeating it as it wont fit into memory.
重写f来处理重复的x是可行的,我希望找到一种巧妙的方法来继承numpy数组以实现此目的.
Rewriting f to handle repeated x would be work and I was hoping for a clever way possibly to subclass a numpy array to do this.
任何提示都值得赞赏.
推荐答案
您可以(几乎)通过大步使用一些技巧来做到这一点.
You can (almost) do this by using a few tricks with strides.
但是,有一些主要警告...
However, there are some major caveats...
import numpy as np
x = np.arange(4)
numrepeats = 3
y = np.lib.stride_tricks.as_strided(x, (numrepeats,)+x.shape, (0,)+x.strides)
print y
x[0] = 9
print y
因此,y
现在是x
的视图,其中每一行都是x
.没有使用新的内存,我们可以将y
设置为任意大小.
So, y
is now a view into x
where each row is x
. No new memory is used, and we can make y
as large as we like.
例如,我可以这样做:
import numpy as np
x = np.arange(4)
numrepeats = 1e15
y = np.lib.stride_tricks.as_strided(x, (numrepeats,)+x.shape, (0,)+x.strides)
...并且使用的内存不超过x
所需的32个字节. (y
将使用ram的〜8 Petabytes ,否则)
...and not use any more memory than the 32 bytes required for x
. (y
would use ~8 Petabytes of ram, otherwise)
但是,如果我们调整y
的形状以使其只有一个尺寸,我们将获得一个副本,该副本将使用全部内存.无法使用步幅和形状来描述x
的水平"平铺视图,因此任何尺寸小于2维的形状都将返回副本.
However, if we reshape y
so that it only has one dimension, we'll get a copy which will use the full amount of memory. There's no way to describe a "horizontally" tiled view of x
using strides and shape, so any shape with less than 2 dimensions will return a copy.
此外,如果我们以返回副本的方式对y
进行操作(例如,您的示例中的y**2
),我们将获得完整副本.
Additionally, if we operate on y
in a way that would return a copy (e.g. the y**2
in your example), we'll get a full copy.
因此,就地进行操作更有意义. (例如y **= 2
或等效的x **= 2
.两者都会完成相同的事情.)
For that reason, it makes more sense to operate on things in-place. (e.g. y **= 2
, or equivalently x **= 2
. Both will accomplish the same thing.)
即使是通用函数,也可以传入x
并将结果放回x
中.
Even for a generic function, you can pass in x
and place the result back in x
.
例如
def f(x):
return x**3
x[...] = f(x)
print y
y
也将被更新,因为它只是x
的视图.
y
will be updated, as well, as it's just a view into x
.
这篇关于重复的numpy子数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!