如果满足NaN阈值，Python从DF删除Feature的所有实例 [英] Python Drop all instances of Feature from DF if NaN thresh is met

查看：94 发布时间：2020/5/16 20:54:28 python pandas dataframe nan

本文介绍了如果满足NaN阈值，Python从DF删除Feature的所有实例的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用df.dropna(thresh = x, inplace=True)，我可以成功删除至少缺少x个非Nan值的行.

Using df.dropna(thresh = x, inplace=True), I can successfully drop the rows lacking at least x non-nan values.

但是因为我的df看起来像:

But because my df looks like:

          2001     2002     2003    2004

bob   A   123      31       4        12
bob   B   41        1       56       13
bob   C   nan      nan      4        nan

bill  A   451      8        nan      24
bill  B   32       5        52        6
bill  C   623      12       41       14

#Repeating features (A,B,C) for each index/name

这会将满足thresh=条件的一个行/实例删除，但是保留该功能的其他实例.

This drops the one row/instance where the thresh= condition is met, but leaves the other instances of that feature.

如果任何一行都满足thresh的要求，那么我想要的就是丢弃整个功能的东西，例如:

What I want is something that drops the entire feature, if the thresh is met for any one row, such as:

df.dropna(thresh = 2, inplace=True):

           2001     2002     2003    2004

bob    A    123      31       4        12
bob    B    41        1       56       13

bill   A    451      8        nan      24
bill   B    32       5        52        6

#Drops C from the whole df

其中C从整个df中删除，而不只是一次它满足bob

wherein C is removed from the entire df, not just the one time it meets the condition under bob

推荐答案

您的样本看起来像一个多索引索引数据框，其中索引级别1是功能A, B, C，索引级别0是名称.您可以使用notna和sum创建一个掩码，以标识非nan值数量小于2的行，并获取其索引级别1值.最后，使用df.query切片行

Your sample looks like a multiindex index dataframe where index level 1 is the feature A, B, C and index level 0 is names. You may use notna and sum to create a mask to identify rows where number of non-nan values less than 2 and get their index level 1 values. Finall, use df.query to slice rows

a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1)
df_final = df.query('ilevel_1 not in @a')

Out[275]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0

方法2 :
使用notna，sum，groupby和transform在非Nan值大于或等于2的组上创建掩码True.最后，使用此掩码对行进行切片

Method 2:
Use notna, sum, groupby and transform to create mask True on groups having non-nan values greater than or equal 2. Finally, use this mask to slice rows

m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]

Out[296]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0

这篇关于如果满足NaN阈值，Python从DF删除Feature的所有实例的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果满足NaN阈值，Python从DF删除Feature的所有实例 [英] Python Drop all instances of Feature from DF if NaN thresh is met

问题描述

如果任何一行都满足`thresh`的要求，那么我想要的就是丢弃整个功能的东西，例如:

What I want is something that drops the entire feature, if the `thresh` is met for any one row, such as:

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如果满足NaN阈值，Python从DF删除Feature的所有实例 [英] Python Drop all instances of Feature from DF if NaN thresh is met

问题描述

如果任何一行都满足thresh的要求，那么我想要的就是丢弃整个功能的东西，例如:

What I want is something that drops the entire feature, if the thresh is met for any one row, such as:

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如果任何一行都满足`thresh`的要求，那么我想要的就是丢弃整个功能的东西，例如:

What I want is something that drops the entire feature, if the `thresh` is met for any one row, such as:

登录关闭