在 pandas 稀疏矩阵中查找全零列 [英] Find all-zero columns in pandas sparse matrix

查看:81
本文介绍了在 pandas 稀疏矩阵中查找全零列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我有一个coo_matrix A:

For example I have a coo_matrix A :

import scipy.sparse as sp
A = sp.coo_matrix([3,0,3,0],
                  [0,0,2,0],
                  [2,5,1,0],
                  [0,0,0,0])

如何获取结果[0,0,0,1],它指示前3列包含非零值,只有第4列全为零.

How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros.

PS:无法将A转换为其他类型.
PS2:我尝试使用np.nonzeros,但看来我的实现不是很好.

PS : cannot convert A to other type.
PS2 : I tried using np.nonzeros but it seems that my implementation is not very elegant.

推荐答案

方法1 我们可以做这样的事情-

Approach #1 We could do something like this -

# Get the columns indices of the input sparse matrix
C = sp.find(A)[1]

# Use np.in1d to create a mask of non-zero columns. 
# So, we invert it and convert to int dtype for desired output.
out = (~np.in1d(np.arange(A.shape[1]),C)).astype(int)

或者,为了使代码更短,我们可以使用减法-

Alternatively, to make the code shorter, we can use subtraction -

out = 1-np.in1d(np.arange(A.shape[1]),C)

分步运行-

1)输入数组和稀疏矩阵:

1) Input array and sparse matrix from it :

In [137]: arr             # Regular dense array
Out[137]: 
array([[3, 0, 3, 0],
       [0, 0, 2, 0],
       [2, 5, 1, 0],
       [0, 0, 0, 0]])

In [138]: A = sp.coo_matrix(arr) # Convert to sparse matrix as input here on

2)获取非零列索引:

2) Get non-zero column indices :

In [139]: C = sp.find(A)[1]

In [140]: C
Out[140]: array([0, 2, 2, 0, 1, 2], dtype=int32)

3)使用np.in1d获取非零列的掩码:

3) Use np.in1d to get mask of non-zero columns :

In [141]: np.in1d(np.arange(A.shape[1]),C)
Out[141]: array([ True,  True,  True, False], dtype=bool)

4)反转:

In [142]: ~np.in1d(np.arange(A.shape[1]),C)
Out[142]: array([False, False, False,  True], dtype=bool)

5)最后转换为int dtype:

5) Finally convert to int dtype :

In [143]: (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Out[143]: array([0, 0, 0, 1])

替代减法:

In [145]: 1-np.in1d(np.arange(A.shape[1]),C)
Out[145]: array([0, 0, 0, 1])

方法2 这是另一种方法,可能是使用matrix-multiplication-

Approach #2 Here's another way and possibly a faster one using matrix-multiplication -

out = 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)


运行时测试

让我们在一个非常稀疏的大型矩阵上测试所有已发布的方法-

Let's test out all the posted approaches on a big and really sparse matrix -

In [29]: A = sp.coo_matrix((np.random.rand(4000,4000)>0.998).astype(int))

In [30]: %timeit 1-np.in1d(np.arange(A.shape[1]),sp.find(A)[1])
100 loops, best of 3: 4.12 ms per loop # Approach1

In [31]: %timeit 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
1000 loops, best of 3: 771 µs per loop # Approach2

In [32]: %timeit 1 - (A.col==np.arange(A.shape[1])[:,None]).any(axis=1)
1 loops, best of 3: 236 ms per loop # @hpaulj's soln

In [33]: %timeit (A!=0).sum(axis=0)==0
1000 loops, best of 3: 1.03 ms per loop  # @jez's soln

In [34]: %timeit (np.sum(np.absolute(A.toarray()), 0) == 0) * 1
10 loops, best of 3: 86.4 ms per loop  # @wwii's soln 

这篇关于在 pandas 稀疏矩阵中查找全零列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆