如何在 Pandas 数据框中查找哪些列包含任何 NaN 值 [英] How to find which columns contain any NaN value in Pandas dataframe
问题描述
给定一个包含散布在各处的可能 NaN 值的 Pandas 数据框:
Given a pandas dataframe containing possible NaN values scattered here and there:
问题:如何确定哪些列包含 NaN 值?特别是,我可以获得包含 NaN 的列名列表吗?
Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?
推荐答案
更新: using Pandas 0.22.0
UPDATE: using Pandas 0.22.0
较新的 Pandas 版本有新方法 'DataFrame.isna()' 和 'DataFrame.notna()'
Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'
In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool
作为列列表:
In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']
选择那些列(包含至少一个 NaN
值):
to select those columns (containing at least one NaN
value):
In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
<小时>
旧答案:
尝试使用 isnull():
In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool
或者@root提出的更清晰的版本:
or as @root proposed clearer version:
In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool
In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']
选择一个子集 - 所有包含至少一个 NaN
值的列:
to select a subset - all columns containing at least one NaN
value:
In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
这篇关于如何在 Pandas 数据框中查找哪些列包含任何 NaN 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!