使用一定数量的非NaN整数将索引保留在Pandas DataFrame中 [英] Keep indices in Pandas DataFrame with a certain number of non-NaN entires

查看:57
本文介绍了使用一定数量的非NaN整数将索引保留在Pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以说我有以下数据框:

Lets say I have the following dataframe:

df1 = pd.DataFrame(data    = [1,np.nan,np.nan,1,1,np.nan,1,1,1], 
                   columns = ['X'], 
                   index   = ['a', 'a', 'a', 
                              'b', 'b', 'b',
                              'c', 'c', 'c'])
print(df1)
     X
a  1.0
a  NaN
a  NaN
b  1.0
b  1.0
b  NaN
c  1.0
c  1.0
c  1.0

我只想保留具有2个或更多非NaN条目的索引.在这种情况下,"a"条目仅具有一个非NaN值,因此我想删除它并将结果设为:

I want to keep only the indices which have 2 or more non-NaN entries. In this case, the 'a' entries only have one non-NaN value, so I want to drop it and have my result be:

     X
b  1.0
b  1.0
b  NaN
c  1.0
c  1.0
c  1.0

做到这一点的最佳方法是什么?理想情况下,我也希望也可以与Dask一起使用,尽管通常如果它与Pandas一起也可以在Dask中使用.

What is the best way to do this? Ideally I want something that works with Dask too, although usually if it works with Pandas it also works in Dask.

推荐答案

让我们尝试 filter

out = df.groupby(level=0).filter(lambda x : x.isna().sum()<=1)
     X
b  1.0
b  1.0
b  NaN
c  1.0
c  1.0
c  1.0

或者我们做 isin

df[df.index.isin(df.isna().sum(level=0).loc[lambda x : x['X']<=1].index)]
     X
b  1.0
b  1.0
b  NaN
c  1.0
c  1.0
c  1.0

这篇关于使用一定数量的非NaN整数将索引保留在Pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆