pandas 按行查找第一个nan值并返回列名 [英] Pandas find first nan value by rows and return column name
问题描述
我有一个像这样的数据框
I have a dataframe like this
>>df1 = pd.DataFrame({'A': ['1', '2', '3', '4','5'],
'B': ['1', '1', '1', '1','1'],
'C': ['c', 'A1', None, 'c3',None],
'D': ['d0', 'B1', 'B2', None,'B4'],
'E': ['A', None, 'S', None,'S'],
'F': ['3', '4', '5', '6','7'],
'G': ['2', '2', None, '2','2']})
>>df1
A B C D E F G
0 1 1 c d0 A 3 2
1 2 1 A1 B1 None 4 2
2 3 1 None B2 S 5 None
3 4 1 c3 None None 6 2
4 5 1 None B4 S 7 2
然后删除包含nan值的行df2 = df1.dropna()
and I drop the rows which contain nan valuesdf2 = df1.dropna()
A B C D E F G
1 2 1 A1 B1 None 4 2
2 3 1 None B2 S 5 None
3 4 1 c3 None None 6 2
4 5 1 None B4 S 7 2
由于这些行包含nan值,因此这是一个删除的数据框. 但是,我想知道为什么要丢弃它们?使该行被删除的第一个nan值列"是哪一列?我需要举报原因.
This is a dropped dataframe due to those rows contain nan values. However,I wanna know why they be dropped? Which column is the "first nan value column" made the row been dropped ? I need a dropped reason for report.
输出应为
['E','C','D','C']
我知道我可以按每一列执行dropna
然后将其记录为原因
但这确实没有效率.
I know I can do dropna
by each column then record it as the reason
but it's really non-efficient.
有没有更有效的方法来解决此问题? 谢谢
Is any more efficient way to solve this problem? Thank you
推荐答案
我认为您可以通过 boolean indexing
,其中至少一个True
由 idxmax
-您会获得DataFrame
的前True
个值的列名:
I think you can create boolean dataframe by DataFrame.isnull
, then filter by boolean indexing
with mask where are at least one True
by any
and last idxmax
- you get column names of first True
values of DataFrame
:
booldf = df1.isnull()
print (booldf)
A B C D E F G
0 False False False False False False False
1 False False False False True False False
2 False False True False False False True
3 False False False True True False False
4 False False True False False False False
print (booldf.any(axis=1))
0 False
1 True
2 True
3 True
4 True
dtype: bool
print (booldf[booldf.any(axis=1)].idxmax(axis=1))
1 E
2 C
3 D
4 C
dtype: object
这篇关于 pandas 按行查找第一个nan值并返回列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!