如果满足NaN阈值,Python从DF删除Feature的所有实例 [英] Python Drop all instances of Feature from DF if NaN thresh is met

查看:94
本文介绍了如果满足NaN阈值,Python从DF删除Feature的所有实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用df.dropna(thresh = x, inplace=True),我可以成功删除至少缺少x个非Nan值的行.

Using df.dropna(thresh = x, inplace=True), I can successfully drop the rows lacking at least x non-nan values.

但是因为我的df看起来像:

But because my df looks like:

          2001     2002     2003    2004

bob   A   123      31       4        12
bob   B   41        1       56       13
bob   C   nan      nan      4        nan

bill  A   451      8        nan      24
bill  B   32       5        52        6
bill  C   623      12       41       14

#Repeating features (A,B,C) for each index/name

这会将满足thresh=条件的一个行/实例删除,但是保留该功能的其他实例.

This drops the one row/instance where the thresh= condition is met, but leaves the other instances of that feature.

如果任何一行都满足thresh的要求,那么我想要的就是丢弃整个功能的东西,例如:

What I want is something that drops the entire feature, if the thresh is met for any one row, such as:

df.dropna(thresh = 2, inplace=True):

           2001     2002     2003    2004

bob    A    123      31       4        12
bob    B    41        1       56       13

bill   A    451      8        nan      24
bill   B    32       5        52        6

#Drops C from the whole df

其中C从整个df中删除,而不只是一次它满足bob

wherein C is removed from the entire df, not just the one time it meets the condition under bob

推荐答案

您的样本看起来像一个多索引索引数据框,其中索引级别1是功能A, B, C,索引级别0是名称.您可以使用notnasum创建一个掩码,以标识非nan值数量小于2的行,并获取其索引级别1值.最后,使用df.query切片行

Your sample looks like a multiindex index dataframe where index level 1 is the feature A, B, C and index level 0 is names. You may use notna and sum to create a mask to identify rows where number of non-nan values less than 2 and get their index level 1 values. Finall, use df.query to slice rows

a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1)
df_final = df.query('ilevel_1 not in @a')

Out[275]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0


方法2 :
使用notnasumgroupbytransform在非Nan值大于或等于2的组上创建掩码True.最后,使用此掩码对行进行切片


Method 2:
Use notna, sum, groupby and transform to create mask True on groups having non-nan values greater than or equal 2. Finally, use this mask to slice rows

m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]

Out[296]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0

这篇关于如果满足NaN阈值,Python从DF删除Feature的所有实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆