Python Pandas-根据NaN计数阈值删除组 [英] Python pandas - remove groups based on NaN count threshold

查看:188
本文介绍了Python Pandas-根据NaN计数阈值删除组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基于不同气象站的数据集,

I have a dataset based on different weather stations,

stationID | Time | Temperature | ...
----------+------+-------------+-------
123       |  1   |     30      |
123       |  2   |     31      |
202       |  1   |     24      |
202       |  2   |     24.3    |
202       |  3   |     NaN     |
...

我想删除'stationID'组,该组的NaN数量超过一定数量.例如,如果我输入:

And I would like to remove 'stationID' groups, which have more than a certain number of NaNs. For instance, if I type:

**>>> df.groupby('stationID')**

然后,我想删除一个组中至少具有一定数量的NaN(例如30个)的组.据我了解,我不能将dropna(thresh = 10)与groupby一起使用:

then, I would like to drop groups that have (at least) a certain number of NaNs (say 30) within a group. As I understand it, I cannot use dropna(thresh=10) with groupby:

**>>> df2.groupby('station').dropna(thresh=30)**
*AttributeError: Cannot access callable attribute 'dropna' of 'DataFrameGroupBy' objects...*

那么,用熊猫来做到这一点的最佳方法是什么?

So, what would be the best way to do that with Pandas?

推荐答案

IIUC,您可以执行df2.loc[df2.groupby('station')['Temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index]

IIUC you can do df2.loc[df2.groupby('station')['Temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index]

示例:

In [59]:
df = pd.DataFrame({'id':[0,0,0,1,1,1,2,2,2,2], 'val':[1,1,np.nan,1,np.nan,np.nan, 1,1,1,1]})
df

Out[59]:
   id  val
0   0  1.0
1   0  1.0
2   0  NaN
3   1  1.0
4   1  NaN
5   1  NaN
6   2  1.0
7   2  1.0
8   2  1.0
9   2  1.0

In [64]:    
df.loc[df.groupby('id')['val'].filter(lambda x: len(x[pd.isnull(x)] ) < 2).index]

Out[64]:
   id  val
0   0  1.0
1   0  1.0
2   0  NaN
6   2  1.0
7   2  1.0
8   2  1.0
9   2  1.0

因此,这将滤除具有1个以上nan值的组

So this will filter out the groups that have more than 1 nan values

这篇关于Python Pandas-根据NaN计数阈值删除组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆