稀疏CSR矩阵的快速切片和乘法 [英] Fast slicing and multiplication of scipy sparse CSR matrix

查看:446
本文介绍了稀疏CSR矩阵的快速切片和乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个稀疏的稀疏矩阵,大小为2M x 50k,具有200M个非零值(每行100个).我需要通过一个(随机分布的)索引(这是一个熊猫Series)将它的12万行切成薄片,然后将该子矩阵乘以大小为1x50k的稀疏向量(还有100个非零值).

我这样做:

slice = matrix[index.tolist(), :]
result = slice.dot(vector.T).T.toarray()[0]  # returns 1x120k array

切片需要0.7s(慢),然后乘法需要0.05s.

相反,我可以先将整个矩阵相乘,然后对结果进行切片:

result = matrix.dot(vector.T).T.toarray()[0]
result_sliced = result[index.tolist()]  # returns 1x120k array

在这种情况下,乘法使用0.65s,然后切片使用0.015s.

问题:

  1. 为什么按行切片CSR矩阵这么慢?甚至整个矩阵的乘法运算所花费的时间都比它少.

  2. 有没有办法更快地达到最终结果?

解决方案

我在 解决方案

I explained in Sparse matrix slicing using list of int that this kind of row indexing is actually performed with matrix multiplication. In effect it constructs a sparse vector with 1's for the desired rows, and does the appropriate dot.

So I'm not surprised that the order of the operations doesn't matter much.

In general sparse matrices are not designed for efficient indexing. They don't, for example, return views. The csr matrix multiplication is one its most efficient operations. Even row or columns sums are performed with matrix multiplication.

这篇关于稀疏CSR矩阵的快速切片和乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆