在numpy中进行矩阵加法的有效方法 [英] efficient way of doing matrix addition in numpy

查看:169
本文介绍了在numpy中进行矩阵加法的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要添加许多矩阵.假设矩阵为[M1,M2 ...,M_n].然后,一个简单的方法是

I have many matrices to add. Let's say that the matrices are [M1, M2..., M_n]. Then, a simple way is

X = np.zeros()
for M in matrices:
    X += M

在操作X + = M中,Python是否在每次执行+ =时都为X创建新的内存?如果真是这样,那似乎效率很低.有什么方法可以在不为X创建新内存的情况下进行就地操作?

In the operation, X += M, does Python create a new memory for X every time += is executed? If that's the case, that seems to be inefficient. Is there any way of doing an in-place operation without creating a new memory for X?

推荐答案

除非收到MemoryError,否则尝试第二次猜测numpy中的内存使用是不值得的.留给那些知道已编译代码的开发人员.

Unless you get MemoryError, trying to second guess memory usage in numpy is not worth the effort. Leave that to the developers who know the compiled code.

但是我们可以执行一些时间测试-这才是真正重要的,不是吗?

But we can perform some time tests - that's what really matters, doesn't it?

我将测试添加一个好的大小数组100次.

I'll test adding a good size array 100 times.

In [479]: M=np.ones((1000,1000))

您使用+ =

In [480]: %%timeit 
     ...: X=np.zeros_like(M)
     ...: for _ in range(100): X+=M
     ...: 
1 loop, best of 3: 627 ms per loop

或制作一个大小为(100,1000,1000)的数组,并在第一个轴上应用np.sum.

Or make an array of size (100, 1000, 1000) and apply np.sum across the first axis.

In [481]: timeit np.sum(np.array([M for _ in range(100)]),axis=0)
1 loop, best of 3: 1.54 s per loop

并使用np.add ufunc.借助reduce,我们可以将其顺序地应用于列表中的所有值.

and using the np.add ufunc. With reduce we can apply it sequentially to all values in a list.

In [482]: timeit np.add.reduce([M for _ in range(100)])
1 loop, best of 3: 1.53 s per loop

如果我使用range(1000),则np.sum情况会给我一个MemoryError错误.我没有足够的内存来容纳(1000,1000,1000)数组.与add.reduce相同,它从列表中构建了一个数组.

The np.sum case gives me a MemoryError if I use range(1000). I don't have enough memory to hold a (1000,1000,1000) array. Same for the add.reduce, which builds an array from the list.

+=在幕后的操作通常是隐藏的,并且通常与我们无关.但要想了解如何在掩护下达到顶峰,请查看ufunc.at: https://docs.scipy.org/doc/numpy/reference/generation/numpy.ufunc.at.html#numpy.ufunc.at

What += does under the cover is normally hidden, and of no concern to us - usually. But for a peak under covers look at ufunc.at: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.at.html#numpy.ufunc.at

对由索引"指定的元素在操作数"a"上执行无缓冲的就地操作.对于加法ufunc,此方法等效于a [indices] + = b,不同之处在于对索引不止一次的元素累积结果.

Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once.

因此,X+=M确实将总和写入缓冲区,然后将该缓冲区复制到X.有一个临时缓冲区,但最终的内存使用量不会改变. 但是,缓冲区的创建和复制是通过快速的C代码完成的.

So X+=M does write the sum to a buffer, and then copies that buffer to X. There is a temporary buffer, but final memory usage does not change. But that buffer creation and copying is done in fast C code.

np.add.at来处理这种缓冲动作导致一些问题(索引重复)的情况.

np.add.at was added to deal with the case where that buffered action creates some problems (duplicate indices).

因此,它避免了使用临时缓冲区-但速度成本很高.可能是增加的索引功能使它变慢了. (可能会有一个更公平的add.at测试;但是在这种情况下,它当然无济于事.)

So it avoids that temporary buffer - but at a considerable speed cost. It's probably the added indexing capability that slows it down. (There may be a fairer add.at test; but it certainly doesn't help in this case.)

In [491]: %%timeit 
     ...: X=np.zeros_like(M)
     ...: for _ in range(100): np.add.at(X,(slice(None),slice(None)),M)
1 loop, best of 3: 19.8 s per loop

这篇关于在numpy中进行矩阵加法的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆