在 Scipy 中切片稀疏矩阵——哪种类型最有效? [英] Slicing Sparse Matrices in Scipy -- Which Types Work Best?

查看:31
本文介绍了在 Scipy 中切片稀疏矩阵——哪种类型最有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SciPy 稀疏矩阵教程 非常好——但它实际上使切片的部分未(开发)开发(仍然是大纲形式——见章节:处理稀疏矩阵").

The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").

一旦这个问题得到解答,我会尝试更新教程.

I will try and update the tutorial, once this question is answered.

我有一个很大的稀疏矩阵——目前是 dok_matrix 格式.

I have a large sparse matrix -- currently in dok_matrix format.

import numpy as np
from scipy import sparse
M = sparse.dok_matrix((10**6, 10**6))

对于各种方法,我希望能够对列进行切片,而对于其他方法,我想对行进行切片.理想情况下,我会使用高级索引(即布尔向量,bool_vect)来对稀疏矩阵 M 进行切片——如:

For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect) with which to slice a sparse matrix M -- as in:

bool_vect = np.arange(10**6)%2  # every even index
out = M[bool_vect,:]            # Want to select every even row

out = M[:,bool_vect] # Want to select every even column

首先,dok_matrices 不支持这一点——但我认为如果我首先通过 sparse.lil_matrix(M)

First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)

就我从教程中收集到的信息而言 - 切片我想使用 CSC 的列和切片我想切片 CSR 的行.那么这是否意味着我应该通过以下方式投射矩阵 M:

As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M via:

M.tocsc()[:,bool_vect]

M.tocsr()[bool_vect,:]

我在这里有点猜测,因此我的代码很慢.任何了解这是如何工作的人的帮助将不胜感激.提前致谢.

I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.

如果事实证明我不应该用布尔数组来索引我的矩阵,而是一个整数(索引)列表——这也是我很乐意找到的.哪个更有效.

If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.

最后——这是一个很大的矩阵,所以如果这可以在现场/广播中发生,则加分.

Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.

推荐答案

好的,所以我很确定正确"的方法是:如果您要对列进行切片,请使用 tocsc() 并使用整数列表/数组进行切片.布尔向量似乎对稀疏矩阵不起作用——它对 numpy.ndarrays 的处理方式.这意味着答案是.

Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is.

indices = np.where(bool_vect)[0]
out1 = M.tocsc()[:,indices]
out2 = M.tocsr()[indices,:]

但问题是:这是最好的方法吗?到位了吗?

But question: is this the best way? Is this in place?

在实践中,这似乎确实发生了——而且比之前的尝试(使用 lil_matrix)要快得多.

In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).

这篇关于在 Scipy 中切片稀疏矩阵——哪种类型最有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆