除了矢量到数组由另一个数组索引 [英] Vectorize addition into array indexed by another array
问题描述
我试图让下面的循环的快速量化版本:
对我的xrange(N1):
A [Y [I]] - = B [我,:]
下面 A.shape =(N2,N3)
, y.shape =(N1)
与是
取值 [0,N2 [
, B.shape =(N1,N3)
。你可以认为是
是指数的条目为 A
行。在这里, N1
大, N2
是pretty小, N3
短小。
我以为只是做
A [Y] - = B
会的工作,但问题是,有重复的条目在是
这并不做正确的事(也就是说,如果 Y = [1,1]
然后 A [1]
只添加一次,不是两次)。另外这似乎是没有任何比unvectorized循环更快。
是否有这样做的更好的办法?
编辑:YXD链接这个答案来起初似乎符合这个要求。这似乎是你可以做的正是我想以
np.subtract.at(A,Y,B)
和它的工作,但是当我尝试运行它,它是的显著慢的比unvectorized版本。所以,问题仍然存在:?是否有这样做的更好的性能的方式。
EDIT2:一个例子,为了使事情具体的:
N1,N2,N3 = 10000,10,500
A = np.random.rand(N2,N3)
Y = np.random.randint(N2,大小= N1)
B = np.random.rand(N1,N3)
for循环,在使用运行%timeit
在IPython中给我的机器上:
10圈,最好的3:每循环19.4毫秒
的 subtract.at
版本生产用于 A
到底值相同,但要慢得多:
1循环,最好的3:每循环444毫秒
在code原for循环基础的方法会是这个样子 -
DEF FOR_LOOP(A):
N1 = B.shape [0]
对于我的xrange(N1):
A [Y [I]] - = B [我,:]
一个回报
案例#1
如果N2 N3 >>,我建议这种做法矢量 -
DEF bincount_vectorized(A): N3 = A.shape [1]
NROWS = y.max()+1
ID = Y [:,无] + NROWS * np.arange(N3)
A [:NROWS] - = np.bincount(id.ravel(),B.ravel())重塑(N3,NROWS).T。
一个回报
运行测试 -
在[203]:N1,N2,N3 = 10000,500,10
...:A = np.random.rand(N2,N3)
...:Y = np.random.randint(N2,大小= N1)
... B = np.random.rand(N1,N3)
...:
...:#制作副本
...:Acopy1 = A.copy()
...:Acopy2 = A.copy()
...:在[204]:%timeit FOR_LOOP(Acopy1)
10圈,最好的3:每循环19毫秒在[205]:%timeit bincount_vectorized(Acopy2)
1000循环,最好的3:每圈779微秒
案例#2
如果N2<< N3,修改for循环的方法与小环的复杂性可能会建议 -
DEF for_loop_v2(A):
N2 = A.shape [0]
因为我在范围(N2):
。A [I] - = np.einsum('ij-> J',B [Y == I])#或(b [Y == I])和(0)
一个回报
运行测试 -
在[206]:N1,N2,N3 = 10000,10,500
...:A = np.random.rand(N2,N3)
...:Y = np.random.randint(N2,大小= N1)
... B = np.random.rand(N1,N3)
...:
...:#制作副本
...:Acopy1 = A.copy()
...:Acopy2 = A.copy()
...:在[207]:%timeit FOR_LOOP(Acopy1)
10圈,最好的3:每循环24.2毫秒在[208]:%timeit for_loop_v2(Acopy2)
10圈,最好的3:每循环20.3毫秒
I am trying to get a fast vectorized version of the following loop:
for i in xrange(N1):
A[y[i]] -= B[i,:]
Here A.shape = (N2,N3)
, y.shape = (N1)
with y
taking values in [0,N2[
, B.shape = (N1,N3)
. You can think of entries of y
being indices into rows of A
. Here N1
is large, N2
is pretty small and N3
is smallish.
I thought simply doing
A[y] -= B
would work, but the issue is that there are repeated entries in y
and this does not do the right thing (i.e., if y=[1,1]
then A[1]
is only added to once, not twice). Also this is does not seem to be any faster than the unvectorized for loop.
Is there a better way of doing this?
EDIT: YXD linked this answer to in comments which at first seems to fit the bill. It would seem you can do exactly what I want with
np.subtract.at(A, y, B)
and it does work, however when I try to run it it is significantly slower than the unvectorized version. So, the question remains: is there a more performant way of doing this?
EDIT2: An example, to make things concrete:
n1,n2,n3 = 10000, 10, 500
A = np.random.rand(n2,n3)
y = np.random.randint(n2, size=n1)
B = np.random.rand(n1,n3)
The for loop, when run using %timeit
in ipython gives on my machine:
10 loops, best of 3: 19.4 ms per loop
The subtract.at
version produces the same value for A
in the end, but is much slower:
1 loops, best of 3: 444 ms per loop
The code for the original for-loop based approach would look something like this -
def for_loop(A):
N1 = B.shape[0]
for i in xrange(N1):
A[y[i]] -= B[i,:]
return A
Case #1
If n2 >> n3, I would suggest this vectorized approach -
def bincount_vectorized(A):
n3 = A.shape[1]
nrows = y.max()+1
id = y[:,None] + nrows*np.arange(n3)
A[:nrows] -= np.bincount(id.ravel(),B.ravel()).reshape(n3,nrows).T
return A
Runtime tests -
In [203]: n1,n2,n3 = 10000, 500, 10
...: A = np.random.rand(n2,n3)
...: y = np.random.randint(n2, size=n1)
...: B = np.random.rand(n1,n3)
...:
...: # Make copies
...: Acopy1 = A.copy()
...: Acopy2 = A.copy()
...:
In [204]: %timeit for_loop(Acopy1)
10 loops, best of 3: 19 ms per loop
In [205]: %timeit bincount_vectorized(Acopy2)
1000 loops, best of 3: 779 µs per loop
Case #2
If n2 << n3, a modified for-loop approach with lesser loop complexity could be suggested -
def for_loop_v2(A):
n2 = A.shape[0]
for i in range(n2):
A[i] -= np.einsum('ij->j',B[y==i]) # OR (B[y==i]).sum(0)
return A
Runtime tests -
In [206]: n1,n2,n3 = 10000, 10, 500
...: A = np.random.rand(n2,n3)
...: y = np.random.randint(n2, size=n1)
...: B = np.random.rand(n1,n3)
...:
...: # Make copies
...: Acopy1 = A.copy()
...: Acopy2 = A.copy()
...:
In [207]: %timeit for_loop(Acopy1)
10 loops, best of 3: 24.2 ms per loop
In [208]: %timeit for_loop_v2(Acopy2)
10 loops, best of 3: 20.3 ms per loop
这篇关于除了矢量到数组由另一个数组索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!