在numpy中进行矩阵加法的有效方法 [英] efficient way of doing matrix addition in numpy
问题描述
我要添加许多矩阵.假设矩阵为[M1,M2 ...,M_n].然后,一个简单的方法是
I have many matrices to add. Let's say that the matrices are [M1, M2..., M_n]. Then, a simple way is
X = np.zeros()
for M in matrices:
X += M
在操作X + = M中,Python是否在每次执行+ =时都为X创建新的内存?如果真是这样,那似乎效率很低.有什么方法可以在不为X创建新内存的情况下进行就地操作?
In the operation, X += M, does Python create a new memory for X every time += is executed? If that's the case, that seems to be inefficient. Is there any way of doing an in-place operation without creating a new memory for X?
推荐答案
除非收到MemoryError,否则尝试第二次猜测numpy中的内存使用是不值得的.留给那些知道已编译代码的开发人员.
Unless you get MemoryError, trying to second guess memory usage in numpy is not worth the effort. Leave that to the developers who know the compiled code.
但是我们可以执行一些时间测试-这才是真正重要的,不是吗?
But we can perform some time tests - that's what really matters, doesn't it?
我将测试添加一个好的大小数组100次.
I'll test adding a good size array 100 times.
In [479]: M=np.ones((1000,1000))
您使用+ =
In [480]: %%timeit
...: X=np.zeros_like(M)
...: for _ in range(100): X+=M
...:
1 loop, best of 3: 627 ms per loop
或制作一个大小为(100,1000,1000)的数组,并在第一个轴上应用np.sum
.
Or make an array of size (100, 1000, 1000) and apply np.sum
across the first axis.
In [481]: timeit np.sum(np.array([M for _ in range(100)]),axis=0)
1 loop, best of 3: 1.54 s per loop
并使用np.add
ufunc.借助reduce
,我们可以将其顺序地应用于列表中的所有值.
and using the np.add
ufunc. With reduce
we can apply it sequentially to all values in a list.
In [482]: timeit np.add.reduce([M for _ in range(100)])
1 loop, best of 3: 1.53 s per loop
如果我使用range(1000)
,则np.sum
情况会给我一个MemoryError错误.我没有足够的内存来容纳(1000,1000,1000)数组.与add.reduce
相同,它从列表中构建了一个数组.
The np.sum
case gives me a MemoryError if I use range(1000)
. I don't have enough memory to hold a (1000,1000,1000) array. Same for the add.reduce
, which builds an array from the list.
+=
在幕后的操作通常是隐藏的,并且通常与我们无关.但要想了解如何在掩护下达到顶峰,请查看ufunc.at
: https://docs.scipy.org/doc/numpy/reference/generation/numpy.ufunc.at.html#numpy.ufunc.at
What +=
does under the cover is normally hidden, and of no concern to us - usually. But for a peak under covers look at ufunc.at
: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.at.html#numpy.ufunc.at
对由索引"指定的元素在操作数"a"上执行无缓冲的就地操作.对于加法ufunc,此方法等效于a [indices] + = b,不同之处在于对索引不止一次的元素累积结果.
Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once.
因此,X+=M
确实将总和写入缓冲区,然后将该缓冲区复制到X
.有一个临时缓冲区,但最终的内存使用量不会改变.
但是,缓冲区的创建和复制是通过快速的C代码完成的.
So X+=M
does write the sum to a buffer, and then copies that buffer to X
. There is a temporary buffer, but final memory usage does not change.
But that buffer creation and copying is done in fast C code.
np.add.at
来处理这种缓冲动作导致一些问题(索引重复)的情况.
np.add.at
was added to deal with the case where that buffered action creates some problems (duplicate indices).
因此,它避免了使用临时缓冲区-但速度成本很高.可能是增加的索引功能使它变慢了. (可能会有一个更公平的add.at
测试;但是在这种情况下,它当然无济于事.)
So it avoids that temporary buffer - but at a considerable speed cost. It's probably the added indexing capability that slows it down. (There may be a fairer add.at
test; but it certainly doesn't help in this case.)
In [491]: %%timeit
...: X=np.zeros_like(M)
...: for _ in range(100): np.add.at(X,(slice(None),slice(None)),M)
1 loop, best of 3: 19.8 s per loop
这篇关于在numpy中进行矩阵加法的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!