使用一定数量的非NaN整数将索引保留在Pandas DataFrame中 [英] Keep indices in Pandas DataFrame with a certain number of non-NaN entires
本文介绍了使用一定数量的非NaN整数将索引保留在Pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
可以说我有以下数据框:
Lets say I have the following dataframe:
df1 = pd.DataFrame(data = [1,np.nan,np.nan,1,1,np.nan,1,1,1],
columns = ['X'],
index = ['a', 'a', 'a',
'b', 'b', 'b',
'c', 'c', 'c'])
print(df1)
X
a 1.0
a NaN
a NaN
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
我只想保留具有2个或更多非NaN条目的索引.在这种情况下,"a"条目仅具有一个非NaN值,因此我想删除它并将结果设为:
I want to keep only the indices which have 2 or more non-NaN entries. In this case, the 'a' entries only have one non-NaN value, so I want to drop it and have my result be:
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
做到这一点的最佳方法是什么?理想情况下,我也希望也可以与Dask一起使用,尽管通常如果它与Pandas一起也可以在Dask中使用.
What is the best way to do this? Ideally I want something that works with Dask too, although usually if it works with Pandas it also works in Dask.
推荐答案
让我们尝试 filter
out = df.groupby(level=0).filter(lambda x : x.isna().sum()<=1)
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
或者我们做 isin
df[df.index.isin(df.isna().sum(level=0).loc[lambda x : x['X']<=1].index)]
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
这篇关于使用一定数量的非NaN整数将索引保留在Pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文