仅使用存储的元素进行稀疏矩阵分配 [英] Scipy sparse matrix assignment using only stored elements

查看:57
本文介绍了仅使用存储的元素进行稀疏矩阵分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个较大的稀疏矩阵globalGrid(lil_matrix)和一个较小的矩阵localGrid(coo_matrix). localGrid代表globalGrid的子集,我想用localGrid更新globalGrid.为此,我使用以下代码(在Python Scipy中):

I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):

globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid

其中xLocalGrid和yLocalGrid是localGrid原点相对于globalGrid的偏移量.

where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.

问题在于localGrid稀疏,但是零元素也被分配给globalGrid.有没有办法我只能分配存储的元素而不分配0元素?

The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

我发现有关numpy中的掩码数组,但是这似乎不适用于稀疏的科学矩阵.

I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.

针对下面的评论,下面是一个示例来说明我的意思:

edit: In response to the comments below, here is a example to illustrate what I mean:

首先设置矩阵:

M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

m.todense()
matrix([[ 1.,  0.,  0.],
    [ 0.,  1.,  0.],
    [ 0.,  0.,  1.]])

然后分配:

M[1:4, 1:4] = m

现在的结果是:

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  0.,  0.,  2.],
    [ 2.,  0.,  1.,  0.,  2.],
    [ 2.,  0.,  0.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

我需要的结果是:

matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  2.,  2.,  2.],
    [ 2.,  2.,  1.,  2.,  2.],
    [ 2.,  2.,  2.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

推荐答案

此行是否应该

问题在于localGrid稀疏,而且非零元素也被分配给globalGrid.有没有办法我只能分配存储的元素而不分配0元素?

The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

更改为?

问题在于localGrid稀疏,而且 zero 元素也分配给globalGrid.有没有办法我只能分配存储的元素而不分配0元素?

The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

您的问题尚不清楚,但是我猜测是因为globalGrid[a:b, c:d]索引跨越两个数组中应为0的值,所以您担心正在复制0.

Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.

让我们尝试使用真实矩阵.

Let's try this with real matrices.

In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)

M没有任何不必要的0.

M does not have any unnecessary 0's.

如果稀疏矩阵中没有多余的0,则往返csr格式时应注意这些问题

If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them

M.tocsr().tolil()

csr格式也具有就地.eliminate_zeros()方法.

csr format also has an inplace .eliminate_zeros() method.

所以您关心的是重写目标数组的非零值.

So your concern is with over writing the nonzeros of the target array.

对于密集数组,使用nonzero(或where)可以解决此问题:

With dense arrays, the use of nonzero (or where) takes care of this:

In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]: 
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])

尝试稀疏等效:

In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]: 
<10x10 sparse matrix of type '<class 'numpy.int32'>'
    with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable

我本可以使用稀疏矩阵自己的nonzero.

I could have used the sparse matrix own nonzero.

In [106]: I,J=m.nonzero()

对于coo格式,这与

In [109]: I,J=m.row, m.col 

在这种情况下,我还可以使用data属性:

In which case I can also use the data attribute:

In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]: 
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)

m.nonzero的代码可能具有启发性

The code for m.nonzero may be instructive

    A = self.tocoo()
    nz_mask = A.data != 0
    return (A.row[nz_mask],A.col[nz_mask])

因此,您需要注意确保稀疏矩阵的索引和数据属性匹配.

So you need to be careful to make sure that index and data attributes of the sparse matrix match.

还要注意哪些稀疏格式允许索引. lil对于更改值很有用. csr允许逐元素索引,但是如果您尝试将零值更改为非零(或v.v.),则会发出效率警告. coo具有索引和数据的良好配对,但不允许索引.

And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.

另一个细微之处:在构造coo时,您可以重复坐标.当转换为csr格式时,将这些值相加.但是我建议的分配将仅使用最后一个值,而不使用总和.因此,请确保您了解local矩阵的构造方式,并知道它是否是数据的干净"表示.

Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.

这篇关于仅使用存储的元素进行稀疏矩阵分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆