过滤掉超过一定数量的 NaN 的行 [英] Filter out rows with more than certain number of NaN
问题描述
在 Pandas 数据框中,我想过滤掉所有超过 2 个 NaN
的行.
In a Pandas dataframe, I would like to filter out all the rows that have more than 2 NaN
s.
基本上,我有 4 列,我只想保留那些至少有 2 列具有有限值的行.
Essentially, I have 4 columns and I would like to keep only those rows where at least 2 columns have finite values.
有人可以就如何实现这一目标提出建议吗?
Can somebody advise on how to achieve this?
推荐答案
以下应该有效
df.dropna(thresh=2)
请参阅在线文档一个>
我们在这里做的是删除任何 NaN
行,其中一行中有 2 个或更多非 NaN
值.
What we are doing here is dropping any NaN
rows, where there are 2 or more non NaN
values in a row.
示例:
In [25]:
import pandas as pd
df = pd.DataFrame({'a':[1,2,NaN,4,5], 'b':[NaN,2,NaN,4,5], 'c':[1,2,NaN,NaN,NaN], 'd':[1,2,3,NaN,5]})
df
Out[25]:
a b c d
0 1 NaN 1 1
1 2 2 2 2
2 NaN NaN NaN 3
3 4 4 NaN NaN
4 5 5 NaN 5
[5 rows x 4 columns]
In [26]:
df.dropna(thresh=2)
Out[26]:
a b c d
0 1 NaN 1 1
1 2 2 2 2
3 4 4 NaN NaN
4 5 5 NaN 5
[4 rows x 4 columns]
编辑
对于上面的例子它是有效的,但你应该注意你必须知道列数并适当地设置 thresh
值,我认为最初它意味着 NaN
值,但它实际上意味着 Non NaN
值的数量.
For the above example it works but you should note that you would have to know the number of columns and set the thresh
value appropriately, I thought originally it meant the number of NaN
values but it actually means number of Non NaN
values.
这篇关于过滤掉超过一定数量的 NaN 的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!