从Scipy稀疏矩阵获取唯一行 [英] Get unique rows from a Scipy sparse matrix
问题描述
我正在使用python中的稀疏矩阵,我想知道是否存在一种有效的方法来删除稀疏矩阵中的重复行,并且仅保留唯一行.
I'm working with sparse matrices in python, I wonder if there is an efficient way to remove duplicate rows in a sparse matrix, and have only the unique rows remain.
我没有找到与之关联的函数,并且不确定如何将稀疏矩阵转换为稠密并使用numpy.unique.
I did not find a function associated with it and not sure how to do it without converting the sparse matrix to dense and use numpy.unique.
推荐答案
没有快速的方法,因此我必须编写一个函数.它返回具有输入稀疏矩阵的唯一行(轴= 0)或列(轴= 1)的稀疏矩阵.
请注意,返回矩阵的唯一行或列不是按字典顺序排序的(np.unique
就是这种情况).
There is no quick way to do it, so I had to write a function. It returns a sparse matrix with the unique rows (axis=0) or columns (axis=1) of an input sparse matrix.
Note that the unique rows or columns of the returned matrix are not lexicographical sorted (as is the case with the np.unique
).
import numpy as np
import scipy.sparse as sp
def sp_unique(sp_matrix, axis=0):
''' Returns a sparse matrix with the unique rows (axis=0)
or columns (axis=1) of an input sparse matrix sp_matrix'''
if axis == 1:
sp_matrix = sp_matrix.T
old_format = sp_matrix.getformat()
dt = np.dtype(sp_matrix)
ncols = sp_matrix.shape[1]
if old_format != 'lil':
sp_matrix = sp_matrix.tolil()
_, ind = np.unique(sp_matrix.data + sp_matrix.rows, return_index=True)
rows = sp_matrix.rows[ind]
data = sp_matrix.data[ind]
nrows_uniq = data.shape[0]
sp_matrix = sp.lil_matrix((nrows_uniq, ncols), dtype=dt) # or sp_matrix.resize(nrows_uniq, ncols)
sp_matrix.data = data
sp_matrix.rows = rows
ret = sp_matrix.asformat(old_format)
if axis == 1:
ret = ret.T
return ret
def lexsort_row(A):
''' numpy lexsort of the rows, not used in sp_unique'''
return A[np.lexsort(A.T[::-1])]
if __name__ == '__main__':
# Test
# Create a large sparse matrix with elements in [0, 10]
A = 10*sp.random(10000, 3, 0.5, format='csr')
A = np.ceil(A).astype(int)
# unique rows
A_uniq = sp_unique(A, axis=0).toarray()
A_uniq = lexsort_row(A_uniq)
A_uniq_numpy = np.unique(A.toarray(), axis=0)
assert (A_uniq == A_uniq_numpy).all()
# unique columns
A_uniq = sp_unique(A, axis=1).toarray()
A_uniq = lexsort_row(A_uniq.T).T
A_uniq_numpy = np.unique(A.toarray(), axis=1)
assert (A_uniq == A_uniq_numpy).all()
这篇关于从Scipy稀疏矩阵获取唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!