如果满足NaN阈值,Python从DF删除Feature的所有实例 [英] Python Drop all instances of Feature from DF if NaN thresh is met
问题描述
使用df.dropna(thresh = x, inplace=True)
,我可以成功删除至少缺少x
个非Nan值的行.
Using df.dropna(thresh = x, inplace=True)
, I can successfully drop the rows lacking at least x
non-nan values.
但是因为我的df看起来像:
But because my df looks like:
2001 2002 2003 2004
bob A 123 31 4 12
bob B 41 1 56 13
bob C nan nan 4 nan
bill A 451 8 nan 24
bill B 32 5 52 6
bill C 623 12 41 14
#Repeating features (A,B,C) for each index/name
这会将满足thresh=
条件的一个行/实例删除,但是保留该功能的其他实例.
This drops the one row/instance where the thresh=
condition is met, but leaves the other instances of that feature.
如果任何一行都满足
thresh
的要求,那么我想要的就是丢弃整个功能的东西,例如:
What I want is something that drops the entire feature, if the
thresh
is met for any one row, such as:
df.dropna(thresh = 2, inplace=True):
2001 2002 2003 2004
bob A 123 31 4 12
bob B 41 1 56 13
bill A 451 8 nan 24
bill B 32 5 52 6
#Drops C from the whole df
其中C
从整个df中删除,而不只是一次它满足bob
wherein C
is removed from the entire df, not just the one time it meets the condition under bob
推荐答案
您的样本看起来像一个多索引索引数据框,其中索引级别1是功能A, B, C
,索引级别0是名称.您可以使用notna
和sum
创建一个掩码,以标识非nan值数量小于2的行,并获取其索引级别1值.最后,使用df.query
切片行
Your sample looks like a multiindex index dataframe where index level 1 is the feature A, B, C
and index level 0 is names. You may use notna
and sum
to create a mask to identify rows where number of non-nan values less than 2 and get their index level 1 values. Finall, use df.query
to slice rows
a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1)
df_final = df.query('ilevel_1 not in @a')
Out[275]:
2001 2002 2003 2004
bob A 123.0 31.0 4.0 12.0
B 41.0 1.0 56.0 13.0
bill A 451.0 8.0 NaN 24.0
B 32.0 5.0 52.0 6.0
方法2 :
使用notna
,sum
,groupby
和transform
在非Nan值大于或等于2的组上创建掩码True
.最后,使用此掩码对行进行切片
Method 2:
Use notna
, sum
, groupby
and transform
to create mask True
on groups having non-nan values greater than or equal 2. Finally, use this mask to slice rows
m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]
Out[296]:
2001 2002 2003 2004
bob A 123.0 31.0 4.0 12.0
B 41.0 1.0 56.0 13.0
bill A 451.0 8.0 NaN 24.0
B 32.0 5.0 52.0 6.0
这篇关于如果满足NaN阈值,Python从DF删除Feature的所有实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!