特定元素的Numpy/Scipy广播计算标量积 [英] Numpy/Scipy broadcast calculating scalar product for a certain elements
问题描述
我有一个像A那样的稀疏矩阵
I've a sparse matrix like A
和一个带有应用于计算标量积的行的数据框(df).
and a dataframe(df) with rows that should be taken to calculate scalar product.
Row1 Row2 Value
2 147 scalar product of vectors at Row1 and Raw2 in matrix A
我可以广播方式播放而不循环播放吗?
Can I do it in broadcasting manner without looping etc?
在我的情况下,A大小为1m * 100k,数据帧为10M
In my case A like 1m*100k size and the dataframe 10M
推荐答案
从一个小的稀疏"矩阵开始(csr最适合数学):
Start with a small 'sparse' matrix (csr is the best for math):
In [167]: A=sparse.csr_matrix([[1, 2, 3], # Vadim's example
[2, 1, 4],
[0, 2, 2],
[3, 0, 3]])
In [168]: AA=A.A # dense equivalent
In [169]: idx=np.array([[1,1,0,3],[3,0,0,2]]).T # indexes
我会坚持使用numpy版本(Pandas建立在numpy之上)
I'll stick with the numpy version (Pandas is built on top of numpy)
我们可以获取所有行点积,并选择由idx
定义的子集:
We could take all the row dot products, and select a subset defined by idx
:
In [170]: (AA.dot(AA.T))[idx[:,0], idx[:,1]]
Out[170]: array([18, 16, 14, 6], dtype=int32)
稀疏矩阵乘积(A.dot(A.T)
也适用:
Sparse matrix product (A.dot(A.T)
also works:
In [171]: (A*A.T)[idx[:,0], idx[:,1]]
Out[171]: matrix([[18, 16, 14, 6]], dtype=int32)
或者我们可以先选择行,然后取乘积之和.我们不想在这里使用dot
,因为我们并没有采用所有组合.
Or we can select the rows first, and then take the sum of products. We don't want to use dot
here, since we aren't taking all combinations.
In [172]: (AA[idx[:,0]]*AA[idx[:,1]]).sum(axis=1)
Out[172]: array([18, 16, 14, 6], dtype=int32)
此计算的einsum
版本:
In [180]: np.einsum('ij,ij->i',AA[idx[:,0]],AA[idx[:,1]])
Out[180]: array([18, 16, 14, 6], dtype=int32)
sparse
可以执行相同的操作(*
是矩阵乘积,.multiply
是一个元素一个元素).
sparse
can do the same (*
is matrix product, .multiply
is element by element).
In [173]: (A[idx[:,0]].multiply(A[idx[:,1]])).sum(axis=1)
Out[173]:
matrix([[18],
[16],
[14],
[ 6]], dtype=int32)
在这种情况下,密集版本的速度更快.稀疏的行索引很慢.
With this small case, the dense versions are faster. Sparse row indexing is slow.
In [181]: timeit np.einsum('ij,ij->i', AA[idx[:,0]], AA[idx[:,1]])
100000 loops, best of 3: 18.1 µs per loop
In [182]: timeit (A[idx[:,0]].multiply(A[idx[:,1]])).sum(axis=1)
1000 loops, best of 3: 1.32 ms per loop
In [184]: timeit (AA.dot(AA.T))[idx[:,0], idx[:,1]]
100000 loops, best of 3: 9.62 µs per loop
In [185]: timeit (A*A.T)[idx[:,0], idx[:,1]]
1000 loops, best of 3: 689 µs per loop
我差点忘了-迭代版本:
I almost forgot - the iterative versions:
In [191]: timeit [AA[i].dot(AA[j]) for i,j in idx]
10000 loops, best of 3: 38.4 µs per loop
In [192]: timeit [A[i].multiply(A[j]).sum() for i,j in idx]
100 loops, best of 3: 2.58 ms per loop
lil
格式矩阵的行索引速度更快
Row indexing of lil
format matrix is faster
In [207]: Al=A.tolil()
In [208]: timeit A[idx[:,0]]
1000 loops, best of 3: 476 µs per loop
In [209]: timeit Al[idx[:,0]]
1000 loops, best of 3: 234 µs per loop
但是,当将其转换回csr
进行乘法运算时,可能无法节省时间.
But by the time it's converted back to csr
for multiplication it might not save time.
===============
===============
在其他最近的SO问题中,我讨论了索引稀疏行或列的更快方法.但是,在这些目标中,最终目标是对一组选定的行或列求和.为此,实际上使用矩阵乘积最快-矩阵为1s和0s.在这里应用这个想法有点棘手.
In other recent SO questions I've discussed faster ways of indexing sparse rows or columns. But in those the final goal was to sum over a selected set of rows or columns. For that it was actually fastest to use a matrix product - with a matrix of 1s and 0s. Applying that idea here is a little trickier.
查看csr.__getitem__
索引功能,我发现它实际上是对矩阵乘积进行A[idx,:]
索引.它创建一个具有以下功能的extractor
矩阵:
Looking at the csr.__getitem__
indexing function, I find that it actually does the A[idx,:]
indexing with a matrix product. It creates an extractor
matrix with function like:
def extractor(indices,N):
"""Return a sparse matrix P so that P*self implements
slicing of the form self[[1,2,3],:]
"""
indptr = np.arange(len(indices)+1, dtype=int)
data = np.ones(len(indices), dtype=int)
shape = (len(indices),N)
return sparse.csr_matrix((data,indices,indptr), shape=shape)
In [328]: %%timeit
.....: A1=extractor(idx[:,0],4)*A
.....: A2=extractor(idx[:,1],4)*A
.....: (A1.multiply(A2)).sum(axis=1)
.....:
1000 loops, best of 3: 1.14 ms per loop
这次比A[idx[:,0],:]
(上面的In[182]
)产生的时间稍好-大概是因为它有点简化了动作.它应该以相同的方式缩放.
This time is slightly better than that produced with A[idx[:,0],:]
(In[182]
above) - presumably because it is streamlining the action a bit. It should scale in the same way.
之所以可行,是因为idx0
是从[1,1,0,3]
This works because idx0
is a boolean matrix derived from [1,1,0,3]
In [330]: extractor(idx[:,0],4).A
Out[330]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]])
In [296]: A[idx[:,0],:].A
Out[296]:
array([[2, 1, 4],
[2, 1, 4],
[1, 2, 3],
[3, 0, 3]], dtype=int32)
In [331]: (extractor(idx[:,0],4)*A).A
Out[331]:
array([[2, 1, 4],
[2, 1, 4],
[1, 2, 3],
[3, 0, 3]], dtype=int32)
================
================
总而言之,如果问题太大而无法直接使用密集数组,那么最好将其缩放到较大的稀疏情况是
In sum, if the problem is too big to use the dense array directly, then be best bet for scaling to a large sparse case is
(A[idx[:,0]].multiply(A[idx[:,1]])).sum(axis=1)
如果这仍然产生内存错误,则可能在idx
数组(或数据帧)的组上进行迭代.
If this is still producing memory errors, then iterate, possibly over groups of the idx
array (or dataframe).
这篇关于特定元素的Numpy/Scipy广播计算标量积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!