稀疏CSR矩阵的快速切片和乘法 [英] Fast slicing and multiplication of scipy sparse CSR matrix
问题描述
Series
)将它的12万行切成薄片,然后将该子矩阵乘以大小为1x50k的稀疏向量(还有100个非零值).
我这样做:
slice = matrix[index.tolist(), :]
result = slice.dot(vector.T).T.toarray()[0] # returns 1x120k array
切片需要0.7s
(慢),然后乘法需要0.05s
.
相反,我可以先将整个矩阵相乘,然后对结果进行切片:
result = matrix.dot(vector.T).T.toarray()[0]
result_sliced = result[index.tolist()] # returns 1x120k array
在这种情况下,乘法使用0.65s
,然后切片使用0.015s
.
问题:
-
为什么按行切片CSR矩阵这么慢?甚至整个矩阵的乘法运算所花费的时间都比它少.
-
有没有办法更快地达到最终结果?
我在 解决方案
I explained in Sparse matrix slicing using list of int that this kind of row indexing is actually performed with matrix multiplication. In effect it constructs a sparse vector with 1's for the desired rows, and does the appropriate dot
.
So I'm not surprised that the order of the operations doesn't matter much.
In general sparse matrices are not designed for efficient indexing. They don't, for example, return views. The csr
matrix multiplication is one its most efficient operations. Even row or columns sums are performed with matrix multiplication.
这篇关于稀疏CSR矩阵的快速切片和乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!