在具有重复项的MultiIndex中删除具有NaN的行 [英] Removing rows with NaN in MultiIndex with duplicates
问题描述
已更新了可解决我确切问题的数据框
我有一个问题,出现在索引中的NaN
导致行不唯一(自NaN !== NaN
起).我需要删除索引中NaN
出现的所有行.我之前的问题有一个带有单个NaN
行的DataFrame示例,但是原始解决方案无法解决我的问题,因为它不满足广告要求不高的要求:
I have an issue where NaN
appearing in my indexes is leading to non-unique rows (since NaN !== NaN
). I need to drop all rows where NaN
occurs in the index. My previous question had an example DataFrame with a single NaN
row, however the original solution did not resolve my issue as it did not meet this poorly advertised requirement:
(请注意,在实际数据中,我有成千上万的此类行,包括自
NaN !== NaN
起的重复行,因此在索引上是允许的)
(Note that in the actual data I have thousands of such rows, including duplicate rows since
NaN !== NaN
so this is permissible on an index)
(摘自我的原始帖子)
>>>import pandas as pd
>>>import numpy as np
>>> df = pd.DataFrame([[1,1,"a"],[1,2,"b"],[1,3,"c"],[1,np.nan,"x"],[1,np.nan,"x"],[1,np.nan,"x"],[2,1,"d"],[2,2,"e"],[np.nan,1,"x"],[np.nan,2,"x"],[np.nan,1,"x"]], columns=["a","b","c"])
>>>df
c
a b
1.0 1.0 a
2.0 b
3.0 c
NaN x
NaN x
NaN x
2.0 1.0 d
2.0 e
NaN 1.0 x
2.0 x
1.0 x
请注意重复的行:(1.0, NaN)
和(NaN, 1.0)
我尝试了一些简单的方法,例如:
I've tried something simple like:
>>>df = df[pandas.notnull(df.index)]
但这失败了,因为未为MultiIndex实现notnull
.
But this fails because notnull
is not implemented for MultiIndex.
还有一个较早的答案建议:
Also one of the early answers suggested:
>>>df = df.reindex(df.index.dropna())
但是此操作失败并显示以下错误:
However this failed with the error:
Exception: cannot handle a non-unique multi-index!
所需的输出:
>>>df
c
a b
1.0 1.0 a
2.0 b
3.0 c
2.0 1.0 d
2.0 e
(所有NaN
索引行均被删除,从而消除了所有非唯一行)
(all NaN
index rows are dropped, eliminating any non-unique rows)
推荐答案
选项1
reset_index
, dropna
和 set_index
再次.
Option 1
reset_index
, dropna
, and set_index
once more.
c = df.index.names
df = df.reset_index().dropna().set_index(c)
df
c
a b
1.0 1.0 a
2.0 b
3.0 c
2.0 1.0 d
2.0 e
2.0 x
1.0 x
如果您的MultiIndex
是唯一的,则可以使用...
选项2
df.index.dropna
和 df.reindex
If your MultiIndex
is unique, you can use...
Option 2
df.index.dropna
and df.reindex
df = df.reindex(df.index.dropna())
这篇关于在具有重复项的MultiIndex中删除具有NaN的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!