快速(或更多)numpy花式索引编制和减少? [英] Fast(er) numpy fancy indexing and reduction?
本文介绍了快速(或更多)numpy花式索引编制和减少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用和加速花式索引来连接"两个数组并在结果轴之一上求和.
I'm trying to use and accelerate fancy indexing to "join" two arrays and sum over one of results' axis.
类似这样的东西:
$ ipython
In [1]: import numpy as np
In [2]: ne, ds = 12, 6
In [3]: i = np.random.randn(ne, ds).astype('float32')
In [4]: t = np.random.randint(0, ds, size=(1e5, ne)).astype('uint8')
In [5]: %timeit i[np.arange(ne), t].sum(-1)
10 loops, best of 3: 44 ms per loop
是否有一种简单的方法来加速In [5]
中的语句?我应该使用OpenMP还是scipy.weave
或Cython
的prange
之类的东西吗?
Is there a simple way to accelerate the statement in In [5]
? Should I go with OpenMP and something like scipy.weave
or Cython
's prange
?
推荐答案
numpy.take
由于某种原因比花哨的索引要快得多.唯一的技巧是将数组视为平面.
numpy.take
is much faster than fancy indexing for some reason. The only trick is that it treats the array as flat.
In [1]: a = np.random.randn(12,6).astype(np.float32)
In [2]: c = np.random.randint(0,6,size=(1e5,12)).astype(np.uint8)
In [3]: r = np.arange(12)
In [4]: %timeit a[r,c].sum(-1)
10 loops, best of 3: 46.7 ms per loop
In [5]: rr, cc = np.broadcast_arrays(r,c)
In [6]: flat_index = rr*a.shape[1] + cc
In [7]: %timeit a.take(flat_index).sum(-1)
100 loops, best of 3: 5.5 ms per loop
In [8]: (a.take(flat_index).sum(-1) == a[r,c].sum(-1)).all()
Out[8]: True
I think the only other way you're going to see much of a speed improvement beyond this would be to write a custom kernel for a GPU using something like PyCUDA.
这篇关于快速(或更多)numpy花式索引编制和减少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文