访问scipy.sparse矩阵中行/列中非零值的最有效方法 [英] Most efficient way of accessing non-zero values in row/column in scipy.sparse matrix

查看:150
本文介绍了访问scipy.sparse矩阵中行/列中非零值的最有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CSR格式访问scipy.sparse矩阵A的行row或列col中的所有非零值的最快或失败的,最简单的方法是什么? >

以其他格式(例如,COO)进行操作会更有效吗?

现在,我使用以下内容:

A[row, A[row, :].nonzero()[1]]

A[A[:, col].nonzero()[0], col]

解决方案

对于这样的问题,了解不同格式的基础数据结构是有必要的:

In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6))
In [673]: A.data
Out[673]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [674]: A.indices
Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
In [675]: A.indptr
Out[675]: array([ 0,  5, 11, 17, 23], dtype=int32)

行的data值是A.data中的一个切片,但是识别该切片需要对A.indptr有所了解(请参阅下文)

对于coo.

In [676]: Ac=A.tocoo()
In [677]: Ac.data
Out[677]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [678]: Ac.row
Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32)
In [679]: Ac.col
Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)

请注意,A.nonzeros()转换为coo并返回rowcol属性(或多或少-查看其代码).

对于lil格式;数据按行存储在列表中:

In [680]: Al=A.tolil()
In [681]: Al.data
Out[681]: 
array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]], dtype=object)
In [682]: Al.rows
Out[682]: 
array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]], dtype=object)

===============

选择行A是可行的,尽管根据我的经验,这有时会有点慢,部分原因是必须创建一个新的csr矩阵.另外,您的表情似乎比需要的要复杂.

看看我的第一行有一个0元素(其他元素太密集):

In [691]: A[0, A[0,:].nonzero()[1]].A
Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32)

以密集数组表示的整行是:

In [692]: A[0,:].A
Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32)

,但该行的data属性与您的选择相同

In [693]: A[0,:].data
Out[693]: array([1, 2, 3, 4, 5], dtype=int32)

并采用lil格式

In [694]: Al.data[0]
Out[694]: [1, 2, 3, 4, 5]

A[0,:].tocoo()不添加任何内容.

在选择列时直接访问csrlil的属性不是很好.因为csc更好,或者是转置的lil.

indptr的帮助下直接访问csr data将是:

In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]]
Out[697]: array([1, 2, 3, 4, 5], dtype=int32)

使用csr格式的计算通常会像这样通过indptr进行迭代,以获取每一行的值-但这是在编译后的代码中完成的.

一个最近的相关主题,按行查找非零元素的乘积: 乘以稀疏矩阵的列元素

我发现使用indptrreduceat相当快.

处理稀疏矩阵的另一种工具是乘法

In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A)
Out[708]: 
<1x6 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>

csr实际上使用这种乘法来执行sum.如果我的记忆正确,它实际上会以这种方式执行A[0,:]

使用int列表进行稀疏矩阵切片

What is the fastest or, failing that, least wordy way of accessing all non-zero values in a row row or column col of a scipy.sparse matrix A in CSR format?

Would doing it in another format (say, COO) be more efficient?

Right now, I use the following:

A[row, A[row, :].nonzero()[1]]

or

A[A[:, col].nonzero()[0], col]

解决方案

For a problem like this is pays to understand the underlying data structures for the different formats:

In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6))
In [673]: A.data
Out[673]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [674]: A.indices
Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
In [675]: A.indptr
Out[675]: array([ 0,  5, 11, 17, 23], dtype=int32)

The data values for a row are a slice within A.data, but identifying that slice requires some knowledge of the A.indptr (see below)

For the coo.

In [676]: Ac=A.tocoo()
In [677]: Ac.data
Out[677]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [678]: Ac.row
Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32)
In [679]: Ac.col
Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)

Note that A.nonzeros() converts to coo and returns the row and col attributes (more or less - look at its code).

For the lil format; data is stored by row in lists:

In [680]: Al=A.tolil()
In [681]: Al.data
Out[681]: 
array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]], dtype=object)
In [682]: Al.rows
Out[682]: 
array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]], dtype=object)

===============

Selecting a row of A works, though in my experience that tends to be a bit slow, in part because it has to create a new csr matrix. Also your expression seems wordier than needed.

Looking at my first row which has a 0 element (the others are too dense):

In [691]: A[0, A[0,:].nonzero()[1]].A
Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32)

The whole row, expressed as a dense array is:

In [692]: A[0,:].A
Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32)

but the data attribute of that row is the same as your selection

In [693]: A[0,:].data
Out[693]: array([1, 2, 3, 4, 5], dtype=int32)

and with the lil format

In [694]: Al.data[0]
Out[694]: [1, 2, 3, 4, 5]

A[0,:].tocoo() doesn't add anything.

Direct access to attributes of a csr and lil isn't that good when picking columns. For that csc is better, or lil of the transpose.

Direct access to the csr data, with the aid of indptr, would be:

In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]]
Out[697]: array([1, 2, 3, 4, 5], dtype=int32)

Calculations using the csr format routinely iterate through indptr like this, getting the values of each row - but they do this in compiled code.

A recent related topic, seeking the product of nonzero elements by row: Multiplying column elements of sparse Matrix

There I found the reduceat using indptr was quite fast.

Another tool when dealing with sparse matrices is multiplication

In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A)
Out[708]: 
<1x6 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>

csr actually does sum with this kind of multiplication. And if my memory is correct, it actually performs A[0,:] this way

Sparse matrix slicing using list of int

这篇关于访问scipy.sparse矩阵中行/列中非零值的最有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆