pandas 按行查找第一个nan值并返回列名 [英] Pandas find first nan value by rows and return column name

查看:209
本文介绍了 pandas 按行查找第一个nan值并返回列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框

I have a dataframe like this

>>df1 = pd.DataFrame({'A': ['1', '2', '3', '4','5'],
              'B': ['1', '1', '1', '1','1'],
              'C': ['c', 'A1', None, 'c3',None],
              'D': ['d0', 'B1', 'B2', None,'B4'],
              'E': ['A', None, 'S', None,'S'],
              'F': ['3', '4', '5', '6','7'],
              'G': ['2', '2', None, '2','2']})
>>df1

   A  B     C     D     E  F     G
0  1  1     c    d0     A  3     2
1  2  1    A1    B1  None  4     2
2  3  1  None    B2     S  5  None
3  4  1    c3  None  None  6     2
4  5  1  None    B4     S  7     2

然后删除包含nan值的行df2 = df1.dropna()

and I drop the rows which contain nan valuesdf2 = df1.dropna()

   A  B     C     D     E  F     G   
1  2  1    A1    B1  None  4     2
2  3  1  None    B2     S  5  None
3  4  1    c3  None  None  6     2
4  5  1  None    B4     S  7     2

由于这些行包含nan值,因此这是一个删除的数据框. 但是,我想知道为什么要丢弃它们?使该行被删除的第一个nan值列"是哪一列?我需要举报原因.

This is a dropped dataframe due to those rows contain nan values. However,I wanna know why they be dropped? Which column is the "first nan value column" made the row been dropped ? I need a dropped reason for report.

输出应为

['E','C','D','C']

我知道我可以按每一列执行dropna然后将其记录为原因 但这确实没有效率.

I know I can do dropna by each column then record it as the reason but it's really non-efficient.

有没有更有效的方法来解决此问题? 谢谢

Is any more efficient way to solve this problem? Thank you

推荐答案

我认为您可以通过

I think you can create boolean dataframe by DataFrame.isnull, then filter by boolean indexing with mask where are at least one True by any and last idxmax - you get column names of first True values of DataFrame:

booldf = df1.isnull()
print (booldf)
       A      B      C      D      E      F      G
0  False  False  False  False  False  False  False
1  False  False  False  False   True  False  False
2  False  False   True  False  False  False   True
3  False  False  False   True   True  False  False
4  False  False   True  False  False  False  False

print (booldf.any(axis=1))
0    False
1     True
2     True
3     True
4     True
dtype: bool

print (booldf[booldf.any(axis=1)].idxmax(axis=1))
1    E
2    C
3    D
4    C
dtype: object

这篇关于 pandas 按行查找第一个nan值并返回列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆