numpy:有效地添加矩阵行 [英] numpy: efficiently add rows of a matrix
问题描述
我有一个矩阵。
mat = array([
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]
])
我想得到某些指数的行总和:例如。
I'd like to get the sum of the rows at certain indices: eg.
ixs = np.array([0,2,0,0,0,1,1])
我知道我可以将答案计算为:
I know I can compute the answer as:
mat[ixs].sum(axis=0)
> array([16, 23, 30, 37])
问题是ixs可能很长,并且我不想使用所有内存来创建中间产品mat [ixs],只是为了再次减少它。
The problem is ixs may be very long, and I don't want to use all the memory to create the intermediate product mat[ixs], only to reduce it again with the sum.
我也知道我可以只需计算索引并使用乘法代替。
I also know I could simply count up the indices and use multiplication instead.
np.bincount(ixs, minlength=mat.shape[0).dot(mat)
> array([16, 23, 30, 37])
但如果我的ix是疏。
But that will be expensive if my ixs are sparse.
我知道scipy的稀疏矩阵,我想我可以使用它们,但我更喜欢纯粹的numpy解决方案,因为稀疏矩阵以各种方式受限(例如只有2-d)
I know about scipy's sparse matrices, and I suppose I could use them, but I'd prefer a pure numpy solution as sparse matrices are limited in various ways (such as only being 2-d)
那么,在这种情况下,是否有一种纯粹的numpy方法来合并索引和减少额?
So, is there a pure numpy way to merge the indexing and sum-reduction in this case?
感谢Divakar和hpaulj的回复。通过稀疏我的意思是范围(w.shape [0])
中的大多数值都不在ixs中。使用这个新的定义(以及更真实的数据大小,我重新运行了Divakar测试,使用了一些新的功能:
Thanks you Divakar and hpaulj for your very thorough responses. By "sparse" I meant that most of the values in range(w.shape[0])
are not in ixs. Using that new definition (and with more realisitic data size, I re-ran Divakar tests, with some new funcitona dded :
rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)
# Divakar's solutions
In[42]: %timeit org_indexing_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[43]: %timeit org_bincount_app(mat, ixs)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop
In[44]: %timeit indexing_modified_app(mat, ixs)
1000 loops, best of 3: 1.81 ms per loop
In[45]: %timeit bincount_modified_app(mat, ixs)
1000 loops, best of 3: 258 µs per loop
In[46]: %timeit simply_indexing_app(mat, ixs)
1000 loops, best of 3: 1.86 ms per loop
In[47]: %timeit take_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[48]: %timeit unq_mask_einsum_app(mat, ixs)
10 loops, best of 3: 58.2 ms per loop
# hpaulj's solutions
In[53]: %timeit hpauljs_sparse_solution(mat, ixs)
The slowest run took 9.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 524 µs per loop
%timeit hpauljs_second_sparse_solution(mat, ixs)
100 loops, best of 3: 9.91 ms per loop
# Sparse version of original bincount solution (see below):
In[60]: %timeit sparse_bincount(mat, ixs)
10000 loops, best of 3: 71.7 µs per loop
在这种情况下,获胜者是bincount解决方案的稀疏版本。
The winner in this case is the sparse version of the bincount solution.
def sparse_bincount(mat, ixs):
x = np.bincount(ixs)
nonzeros, = np.nonzero(x)
x[nonzeros].dot(mat[nonzeros])
推荐答案
由于我们假设 ixs
可能是 sparsey ,我们可以修改策略以从零获取行的总和-th
基于给定的行索引分别划分行和其余行。因此,我们可以使用 bincount
方法对非零
索引行求和,并将其与(第零行x没有零
在 ixs
)。
Since we are assuming that ixs
could be sparsey, we could modify the strategy to get the summations of rows from the zero-th
row and rest of the rows separately based on the given row indices. So, we could use the bincount
method for the non-zero-th
indexed rows summation and add it with the (zero-th row x no. of zeros
in ixs
).
因此,第二种方法可以修改,如下所示 -
Thus, the second approach could be modified, like so -
nzmask = ixs!=0
nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
out = nzsum + row0_sum
我们可以扩展这个策略第一种方法,如此 -
We could extend this strategy to the first approach as well, like so -
out = mat[0]*(len(ixs) - len(nzidx)) + mat[ixs[nzidx]].sum(axis=0)
如果我们正在工作有很多重复的非零指数,我们可以选择使用 np.take
,重点关注性能。因此, mat [ixs [nzidx]]
可以替换为 np.take(mat,ixs [nzidx],axis = 0)
和类似 mat [ixs]
由 np.take(mat,ixs,axis = 0)
。有了这样的基于索引的重复索引 np.take
与简单索引相比,带来了一些明显的加速。
If we are working with lots of non-zero indices that are repeated, we could alternatively make use of np.take
with focus on performance. Thus, mat[ixs[nzidx]]
could be replaced by np.take(mat,ixs[nzidx],axis=0)
and similarly mat[ixs]
by np.take(mat,ixs,axis=0)
. With such repeated indices based indexing np.take
brings out some noticeable speedup as compared to simply indexing.
最后,我们可以使用 np.einsum
执行这些基于行ID的选择和求和,如下所示 -
Finally, we could use np.einsum
to perform these row ID based selection and summing, like so -
nzmask = ixs!=0
unq,tags = np.unique(ixs[nzmask],return_inverse=1)
nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
out = mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum
基准测试
让我们列出本文迄今为止发布的所有五种方法还包括在问题中发布的两种方法作为函数进行一些运行时测试 -
Benchmarking
Let's list out all the five approaches posted thus far in this post and also include the two approaches posted in the question for some runtime testing as functions -
def org_indexing_app(mat,ixs):
return mat[ixs].sum(axis=0)
def org_bincount_app(mat,ixs):
return np.bincount(ixs, minlength=mat.shape[0]).dot(mat)
def indexing_modified_app(mat,ixs):
return np.take(mat,ixs,axis=0).sum(axis=0)
def bincount_modified_app(mat,ixs):
nzmask = ixs!=0
nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
return nzsum + row0_sum
def simply_indexing_app(mat,ixs):
nzmask = ixs!=0
nzsum = mat[ixs[nzmask]].sum(axis=0)
return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum
def take_app(mat,ixs):
nzmask = ixs!=0
nzsum = np.take(mat,ixs[nzmask],axis=0).sum(axis=0)
return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum
def unq_mask_einsum_app(mat,ixs):
nzmask = ixs!=0
unq,tags = np.unique(ixs[nzmask],return_inverse=1)
nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum
时间
案例#1( ixs
是95%sparsey):
Case #1 (ixs
is 95% sparsey) :
In [301]: # Setup input
...: mat = np.random.rand(20,4)
...: ixs = np.random.randint(0,10,(100000))
...: ixs[np.random.rand(ixs.size)<0.95] = 0 # Make it approx 95% sparsey
...:
In [302]: # Timings
...: %timeit org_indexing_app(mat,ixs)
...: %timeit org_bincount_app(mat,ixs)
...: %timeit indexing_modified_app(mat,ixs)
...: %timeit bincount_modified_app(mat,ixs)
...: %timeit simply_indexing_app(mat,ixs)
...: %timeit take_app(mat,ixs)
...: %timeit unq_mask_einsum_app(mat,ixs)
...:
100 loops, best of 3: 4.89 ms per loop
1000 loops, best of 3: 428 µs per loop
100 loops, best of 3: 3.29 ms per loop
1000 loops, best of 3: 329 µs per loop
1000 loops, best of 3: 537 µs per loop
1000 loops, best of 3: 462 µs per loop
1000 loops, best of 3: 1.07 ms per loop
案例#2( ixs
是98%sparsey):
Case #2 (ixs
is 98% sparsey) :
In [303]: # Setup input
...: mat = np.random.rand(20,4)
...: ixs = np.random.randint(0,10,(100000))
...: ixs[np.random.rand(ixs.size)<0.98] = 0 # Make it approx 98% sparsey
...:
In [304]: # Timings
...: %timeit org_indexing_app(mat,ixs)
...: %timeit org_bincount_app(mat,ixs)
...: %timeit indexing_modified_app(mat,ixs)
...: %timeit bincount_modified_app(mat,ixs)
...: %timeit simply_indexing_app(mat,ixs)
...: %timeit take_app(mat,ixs)
...: %timeit unq_mask_einsum_app(mat,ixs)
...:
100 loops, best of 3: 4.86 ms per loop
1000 loops, best of 3: 438 µs per loop
100 loops, best of 3: 3.5 ms per loop
1000 loops, best of 3: 260 µs per loop
1000 loops, best of 3: 318 µs per loop
1000 loops, best of 3: 288 µs per loop
1000 loops, best of 3: 694 µs per loop
这篇关于numpy:有效地添加矩阵行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!