在 pandas 稀疏矩阵中查找全零列 [英] Find all-zero columns in pandas sparse matrix
问题描述
例如,我有一个coo_matrix A:
For example I have a coo_matrix A :
import scipy.sparse as sp
A = sp.coo_matrix([3,0,3,0],
[0,0,2,0],
[2,5,1,0],
[0,0,0,0])
如何获取结果[0,0,0,1],它指示前3列包含非零值,只有第4列全为零.
How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros.
PS:无法将A转换为其他类型.
PS2:我尝试使用np.nonzeros
,但看来我的实现不是很好.
PS : cannot convert A to other type.
PS2 : I tried using np.nonzeros
but it seems that my implementation is not very elegant.
推荐答案
方法1 我们可以做这样的事情-
Approach #1 We could do something like this -
# Get the columns indices of the input sparse matrix
C = sp.find(A)[1]
# Use np.in1d to create a mask of non-zero columns.
# So, we invert it and convert to int dtype for desired output.
out = (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
或者,为了使代码更短,我们可以使用减法-
Alternatively, to make the code shorter, we can use subtraction -
out = 1-np.in1d(np.arange(A.shape[1]),C)
分步运行-
1)输入数组和稀疏矩阵:
1) Input array and sparse matrix from it :
In [137]: arr # Regular dense array
Out[137]:
array([[3, 0, 3, 0],
[0, 0, 2, 0],
[2, 5, 1, 0],
[0, 0, 0, 0]])
In [138]: A = sp.coo_matrix(arr) # Convert to sparse matrix as input here on
2)获取非零列索引:
2) Get non-zero column indices :
In [139]: C = sp.find(A)[1]
In [140]: C
Out[140]: array([0, 2, 2, 0, 1, 2], dtype=int32)
3)使用np.in1d
获取非零列的掩码:
3) Use np.in1d
to get mask of non-zero columns :
In [141]: np.in1d(np.arange(A.shape[1]),C)
Out[141]: array([ True, True, True, False], dtype=bool)
4)反转:
In [142]: ~np.in1d(np.arange(A.shape[1]),C)
Out[142]: array([False, False, False, True], dtype=bool)
5)最后转换为int dtype:
5) Finally convert to int dtype :
In [143]: (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Out[143]: array([0, 0, 0, 1])
替代减法:
In [145]: 1-np.in1d(np.arange(A.shape[1]),C)
Out[145]: array([0, 0, 0, 1])
方法2 这是另一种方法,可能是使用matrix-multiplication
-
Approach #2 Here's another way and possibly a faster one using matrix-multiplication
-
out = 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
运行时测试
让我们在一个非常稀疏的大型矩阵上测试所有已发布的方法-
Let's test out all the posted approaches on a big and really sparse matrix -
In [29]: A = sp.coo_matrix((np.random.rand(4000,4000)>0.998).astype(int))
In [30]: %timeit 1-np.in1d(np.arange(A.shape[1]),sp.find(A)[1])
100 loops, best of 3: 4.12 ms per loop # Approach1
In [31]: %timeit 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
1000 loops, best of 3: 771 µs per loop # Approach2
In [32]: %timeit 1 - (A.col==np.arange(A.shape[1])[:,None]).any(axis=1)
1 loops, best of 3: 236 ms per loop # @hpaulj's soln
In [33]: %timeit (A!=0).sum(axis=0)==0
1000 loops, best of 3: 1.03 ms per loop # @jez's soln
In [34]: %timeit (np.sum(np.absolute(A.toarray()), 0) == 0) * 1
10 loops, best of 3: 86.4 ms per loop # @wwii's soln
这篇关于在 pandas 稀疏矩阵中查找全零列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!