有没有更快的方法来添加两个2-d numpy数组 [英] Is there a faster way to add two 2-d numpy array
问题描述
比方说,我有两个尺寸相同的大型二维numpy数组(例如2000x2000).我想对它们进行元素明智的总结.我想知道是否有比np.add()更快的方法
Let say I have two large 2-d numpy array of same dimensions (say 2000x2000). I want to sum them element wise. I was wondering if there is a faster way than np.add()
编辑:我正在添加一个类似的示例,说明我现在正在使用的内容.有什么方法可以加快速度吗?
I am adding a similar example of what I am using now. Is there a way to speed up this?
#a and b are the two matrices I already have.Dimension is 2000x2000
#shift is also a list that is previously known
for j in range(100000):
b=np.roll(b, shift[j] , axis=0)
a=np.add(a,b)
推荐答案
方法1(矢量化)
我们可以使用modulus
来模拟roll/circshift
的循环行为,并使用广播索引覆盖所有行,我们将拥有完全矢量化的方法,就像这样-
Approach #1 (Vectorized)
We can use modulus
to simulate the circulating behavior of roll/circshift
and with broadcasted indices to cover all rows, we would have a fully vectorized approach, like so -
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
方法2(循环1)
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
冒号表示法与使用索引进行切片
一旦进入循环,这里的想法就是做最少的工作.在进入循环之前,我们正在预先计算每个迭代的起始行索引.因此,我们在循环内需要做的所有事情就是使用冒号表示法进行切片,这是对数组的视图并累加.这应该比rolling
更好,因为rolling
需要计算所有这些行索引,从而导致复制成本很高.
The idea here to do minimal work once we are inside the loop. We are pre-computing the start row index of each iteration before going into the loop. So, all we need to do once inside the loop is slicing using colon notation, which is a view into the array and adding up. This should be much better than rolling
that needs to compute all of those row indices that results in a copy that is expensive.
在对冒号和索引进行切片时,视图和复制的概念会更多一些-
Here's a bit more into the view and copy concepts when slicing with colon and indices -
In [11]: a = np.random.randint(0,9,(10))
In [12]: a
Out[12]: array([8, 0, 1, 7, 5, 0, 6, 1, 7, 0])
In [13]: a[3:8]
Out[13]: array([7, 5, 0, 6, 1])
In [14]: a[[3,4,5,6,7]]
Out[14]: array([7, 5, 0, 6, 1])
In [15]: np.may_share_memory(a, a[3:8])
Out[15]: True
In [16]: np.may_share_memory(a, a[[3,4,5,6,7]])
Out[16]: False
运行时测试
功能定义-
Runtime test
Function defintions -
def original_loopy_app(a,b):
for j in range(shift.size):
b=np.roll(b, shift[j] , axis=0)
a += b
def vectorized_app(a,b):
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
def modified_loopy_app(a,b):
n = b.shape[0]
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
案例1:
In [5]: # Setup input arrays
...: N = 200
...: M = 1000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [6]: original_loopy_app(a1,b1)
...: vectorized_app(a2,b2)
...: modified_loopy_app(a3,b3)
...:
In [7]: np.allclose(a1, a2) # Verify results
Out[7]: True
In [8]: np.allclose(a1, a3) # Verify results
Out[8]: True
In [9]: %timeit original_loopy_app(a1,b1)
...: %timeit vectorized_app(a2,b2)
...: %timeit modified_loopy_app(a3,b3)
...:
10 loops, best of 3: 107 ms per loop
10 loops, best of 3: 137 ms per loop
10 loops, best of 3: 48.2 ms per loop
案例2:
In [13]: # Setup input arrays (datasets are exactly 1/10th of original sizes)
...: N = 200
...: M = 10000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [14]: %timeit original_loopy_app(a1,b1)
...: %timeit modified_loopy_app(a3,b3)
...:
1 loops, best of 3: 1.11 s per loop
1 loops, best of 3: 481 ms per loop
因此,我们正在研究采用改进的循环方法来提高 2x+
的速度!
So, we are looking at 2x+
speedup there with the modified loopy approach!
这篇关于有没有更快的方法来添加两个2-d numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!