在Scipy中切割稀疏矩阵 - 哪种类型最好? [英] Slicing Sparse Matrices in Scipy -- Which Types Work Best?
问题描述
SciPy 稀疏矩阵教程非常好 - 但它实际上留下了切片un(der)开发的部分(仍然是大纲形式 - 参见章节:处理稀疏矩阵)。
The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").
一旦这个问题得到解答,我将尝试更新教程。
I will try and update the tutorial, once this question is answered.
我有一个大的稀疏矩阵 - 目前采用dok_matrix格式。
I have a large sparse matrix -- currently in dok_matrix format.
import numpy as np
from scipy import sparse
M = sparse.dok_matrix((10**6, 10**6))
对于各种方法,我希望能够切割列和其他我想切片行。理想情况下,我会使用高级索引(即布尔向量, bool_vect
)来切片稀疏矩阵 M
- - 如:
For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect
) with which to slice a sparse matrix M
-- as in:
bool_vect = np.arange(10**6)%2 # every even index
out = M[bool_vect,:] # Want to select every even row
或
out = M[:,bool_vect] # Want to select every even column
首先,dok_matrices不支持这个 - 但我认为如果我第一次转向lil_matrices,通过 sparse.lil_matrix(M)可以(慢慢地)工作
First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)
据我可以从教程中收集 - 切片列我想要使用CSC并切片行我想切片CSR 。那么这是否意味着我应该通过以下方式投射矩阵 M
:
As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M
via:
M.tocsc()[:,bool_vect]
或
M.tocsr()[bool_vect,:]
<我有点猜测,因为它,我的代码很慢。知道如何工作的人的任何帮助将不胜感激。提前谢谢。
I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.
如果事实证明我不应该用布尔数组索引我的矩阵,而是整数(索引)列表 - 这也是我很乐意找到答案。无论哪个效率更高。
If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.
最后 - 这是一个很大的矩阵,如果这可能发生在广播中,那么奖励积分。
Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.
推荐答案
好的,所以我很确定这样做的正确方法是:
如果要切片列,请使用tocsc()和切片使用整数列表/数组。布尔向量似乎没有使用稀疏矩阵 - 它在numpy中使用ndarrays的方式。这意味着答案是。
Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is.
indices = np.where(bool_vect)[0]
out1 = M.tocsc()[:,indices]
out2 = M.tocsr()[indices,:]
<但问题是:这是最好的方法吗?这是否到位?
But question: is this the best way? Is this in place?
在实践中,这似乎确实发生了 - 它比以前的尝试快得多(使用lil_matrix)。
In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).
这篇关于在Scipy中切割稀疏矩阵 - 哪种类型最好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!