在不更改稀疏性的情况下将切片稀疏矩阵相乘 [英] Multiply slice of scipy sparse matrix without changing sparsity

查看:197
本文介绍了在不更改稀疏性的情况下将切片稀疏矩阵相乘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

scipy中,当我将稀疏矩阵的一个切片与仅包含零的数组相乘时,结果是一个比以前稀疏或相等的稀疏矩阵,即使它应该稀疏或相等.将矩阵的部分设置为0或False的情况相同:

>>> import numpy as np
>>> from scipy.sparse import csr_matrix as csr
>>> M = csr(np.random.random((8,8))>0.9)
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> M[:,0] = False
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>
>>> M[:,0].multiply(np.array([[False] for i in xrange(8)]))
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>

对于大型矩阵,这实际上在计算上是昂贵的,因为它会遍历切片中的所有单元,而不仅仅是非零单元.

从数学/逻辑角度来看,当将稀疏矩阵或向量相乘时,所有空白单元格都一定会像0*x == 0那样保持空白.设置为零的情况也一样:零单元不需要明确地设置为零.

处理此问题的最佳方法是什么?


我正在使用 scipy版本0.17.0

解决方案

在使用稀疏矩阵时,更改稀疏模式通常是非常昂贵的操作,因此scipy不会默默地执行此操作.

如果要从稀疏矩阵中删除显式存储的零,则应使用eliminate_zeros()方法;否则,请执行以下操作.例如:

>>> M = csr(np.random.random((1000,1000))>0.9, dtype=float)
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M[:, 0] *= 0
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M.eliminate_zeros()
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99657 stored elements in Compressed Sparse Row format>

Scipy 可以进行此类操作后自动调用eliminate_zeros例程,但是开发人员选择在执行诸如更改稀疏结构之类的昂贵操作时,为用户提供更大的灵活性和控制力. /p>

In scipy, when I multiply a slice of a sparse matrix with an array containing only zeros, the result is a matrix that is less or equally sparse than before, even though it should be more or equally sparse. The same holds for setting parts of the matrix to 0 or False:

>>> import numpy as np
>>> from scipy.sparse import csr_matrix as csr
>>> M = csr(np.random.random((8,8))>0.9)
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> M[:,0] = False
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>
>>> M[:,0].multiply(np.array([[False] for i in xrange(8)]))
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>

This is actually computationally expensive for large matrices, because it iterates over all cells in the slice, not just the nonzero ones.

From a mathematical / logical point of view, when multiplying a sparse matrix or vector, all empty cells are certain to remain empty as 0*x == 0. The same holds for setting to zero: zero-cells do not need to be explicitely set to zero.

What is the best way to deal with this?


I am using scipy version 0.17.0

解决方案

In working with sparse matrices, changing the sparsity pattern is generally a very expensive operation, and so scipy does not do this silently.

If you want to remove explicitly stored zeros from a sparse matrix, you should use the eliminate_zeros() method; for example:

>>> M = csr(np.random.random((1000,1000))>0.9, dtype=float)
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M[:, 0] *= 0
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M.eliminate_zeros()
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99657 stored elements in Compressed Sparse Row format>

Scipy could call the eliminate_zeros routine automatically after doing this kind of operation, but the developers chose to give the user more flexibility and control when doing something as expensive as changing the sparsity structure.

这篇关于在不更改稀疏性的情况下将切片稀疏矩阵相乘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆