加快与numpy的循环 [英] Speed up for loop with numpy
问题描述
如何能在下一个for循环得到与numpy的一个加速?我想一些花哨的索引招可以用在这里,但我不知道哪一个(可einsum在这里使用?)。
How can this next for-loop get a speedup with numpy? I guess some fancy indexing-trick can be used here, but i have no idea which one (can einsum be used here?).
a=0
for i in range(len(b)):
a+=numpy.mean(C[d,e,f+b[i]])*g[i]
编辑: C
是外形相媲美的3D numpy的阵列(20,1600,500)
。 D,E,F
是是有趣的分指数( D,E,F
的长度是相同的,大约900)
b和g具有相同的长度(约50)。平均被接管所有的点 C
与指数 D,E,F + B [I]
edit:
C
is a numpy 3D array of shape comparable to (20, 1600, 500)
.
d,e,f
are indices of points that are "interesting" (lengths of d,e,f
are the same and around 900)
b and g have the same length (around 50). The mean is taken over all the points in C
with the indices d,e,f+b[i]
推荐答案
两个会话用初始化
In [1]: C = np.random.rand(20,1600,500)
In [2]: d = np.random.randint(0, 20, size=900)
In [3]: e = np.random.randint(1600, size=900)
In [4]: f = np.random.randint(400, size=900)
In [5]: b = np.random.randint(100, size=50)
In [6]: g = np.random.rand(50)
numpy的1.9.0
In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 942 µs per loop
In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 762 µs per loop
In [9]: %%timeit
...: a = 0
...: for i in range(len(b)):
...: a += np.mean(C[d, e, f + b[i]]) * g[i]
...:
100 loops, best of 3: 2.25 ms per loop
In [10]: np.__version__
Out[10]: '1.9.0'
In [11]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
e[:,np.newaxis],
f[:,np.newaxis] + b), dims=C.shape)]
.mean(axis=0).dot(g))
....:
1000 loops, best of 3: 940 µs per loop
numpy的1.8.2
In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
100 loops, best of 3: 2.81 ms per loop
In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
100 loops, best of 3: 2.7 ms per loop
In [9]: %%timeit
...: a = 0
...: for i in range(len(b)):
...: a += np.mean(C[d, e, f + b[i]]) * g[i]
...:
100 loops, best of 3: 4.12 ms per loop
In [10]: np.__version__
Out[10]: '1.8.2'
In [51]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
e[:,np.newaxis],
f[:,np.newaxis] + b), dims=C.shape)]
.mean(axis=0).dot(g))
....:
1000 loops, best of 3: 1.4 ms per loop
说明
您可以使用协调广播招充实从一开始你50x900数组:
Description
You can use coordinate broadcasting trick to flesh out your 50x900 array from the beginning:
In [158]: C[d,e,f + b[:, np.newaxis]].shape
Out[158]: (50, 900)
从这一点来说,的意思是
和点
将让你到目的地:
In [159]: C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
Out[159]: 13.582349962518611
In [160]:
a = 0
for i in range(len(b)):
a += np.mean(C[d, e, f + b[i]]) * g[i]
print(a)
.....:
13.5823499625
和它的约3.3倍比环版本快:
And it's about 3.3x faster than the loop version:
In [161]: %timeit C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 585 µs per loop
In [162]: %%timeit
a = 0
for i in range(len(b)):
a += np.mean(C[d, e, f + b[i]]) * g[i]
.....:
1000 loops, best of 3: 1.95 ms per loop
该数组是显著的大小,所以你必须在CPU缓存因素。我不能说我知道如何 np.sum
遍历数组,但在二维数组总是有一个稍微好一点的方法(当你选择的下一个元素是相邻的存储明智)和一个略差方式(当一个元素跨过步幅找到)。让我们看看,如果我们可以通过索引过程调换阵赢得更多的东西:
The array is of significant size, so you must factor in CPU cache. I cannot say I know how np.sum
traverses the array, but in 2d arrays there is always a slightly better way (when the next element you pick is adjacent memory-wise) and a slightly worse way (when the next element is found across the stride). Let's see if we can win something more by transposing the array during indexing:
In [196]: C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
Out[196]: 13.582349962518608
In [197]: %timeit C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 461 µs per loop
这比循环快4.2倍。
That's 4.2x faster than the loop.
这篇关于加快与numpy的循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!