在Scipy中切割稀疏矩阵 - 哪种类型最好? [英] Slicing Sparse Matrices in Scipy -- Which Types Work Best?

查看:175
本文介绍了在Scipy中切割稀疏矩阵 - 哪种类型最好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SciPy 稀疏矩阵教程非常好 - 但它实际上留下了切片un(der)开发的部分(仍然是大纲形式 - 参见章节:处理稀疏矩阵)。

The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").

一旦这个问题得到解答,我将尝试更新教程。

I will try and update the tutorial, once this question is answered.

我有一个大的稀疏矩阵 - 目前采用dok_matrix格式。

I have a large sparse matrix -- currently in dok_matrix format.

import numpy as np
from scipy import sparse
M = sparse.dok_matrix((10**6, 10**6))

对于各种方法,我希望能够切割列和其他我想切片行。理想情况下,我会使用高级索引(即布尔向量, bool_vect )来切片稀疏矩阵 M - - 如:

For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect) with which to slice a sparse matrix M -- as in:

bool_vect = np.arange(10**6)%2  # every even index
out = M[bool_vect,:]            # Want to select every even row

out = M[:,bool_vect] # Want to select every even column

首先,dok_matrices不支持这个 - 但我认为如果我第一次转向lil_matrices,通过 sparse.lil_matrix(M)可以(慢慢地)工作

First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)

据我可以从教程中收集 - 切片列我想要使用CSC并切片行我想切片CSR 。那么这是否意味着我应该通过以下方式投射矩阵 M

As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M via:

M.tocsc()[:,bool_vect]

M.tocsr()[bool_vect,:]



<我有点猜测,因为它,我的代码很慢。知道如何工作的人的任何帮助将不胜感激。提前谢谢。

I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.

如果事实证明我不应该用布尔数组索引我的矩阵,而是整数(索引)列表 - 这也是我很乐意找到答案。无论哪个效率更高。

If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.

最后 - 这是一个很大的矩阵,如果这可能发生在广播中,那么奖励积分。

Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.

推荐答案

好的,所以我很确定这样做的正确方法是:
如果要切片列,请使用tocsc()和切片使用整数列表/数组。布尔向量似乎没有使用稀疏矩阵 - 它在numpy中使用ndarrays的方式。这意味着答案是。

Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is.

indices = np.where(bool_vect)[0]
out1 = M.tocsc()[:,indices]
out2 = M.tocsr()[indices,:]



<但问题是:这是最好的方法吗?这是否到位?

But question: is this the best way? Is this in place?

在实践中,这似乎确实发生了 - 它比以前的尝试快得多(使用lil_matrix)。

In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).

这篇关于在Scipy中切割稀疏矩阵 - 哪种类型最好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆