pandas :当列中的所有数据均为NaN时,从多级索引中删除索引条目(及其所有行) [英] Pandas: Remove index entry (and all it's rows) from multilevel index when all data in a column is NaN
问题描述
我想清理具有多级索引的数据框中的一些数据.
I'd like to clean up some data I have in a dataframe with a multilevel index.
| A | B |
----------------+-----+-----+
foo 2019-01-01 | x | NaN |
2019-01-02 | x | NaN |
2019-01-03 | NaN | NaN |
................+.....+.....+
bar 2019-01-01 | NaN | x |
2019-01-02 | NaN | y |
2019-01-03 | NaN | z |
................+.....+.....+
baz 2019-01-01 | x | x |
2019-01-02 | x | x |
2019-01-03 | x | x |
我想删除由 bar 索引的整个组,因为 A 列中的所有数据均为 NaN .我想保留 foo ,因为只有 A 列中的某些数据是 NaN (列 B 即使全部是 NaN ),在这里也不重要.我想保留 baz ,因为并非所有 A 列都是 NaN . 所以我的结果应该像这样:
I'd like to loose the complete group indexed by bar, because all of the data in column A is NaN. I'd like to keep foo, because only some of the data in column A is NaN (column B is not important here, even if it's all NaN). I'd like to keep baz, because not all of column Ais NaN. So my result should look like this:
| A | B |
----------------+-----+-----+
foo 2019-01-01 | x | NaN |
2019-01-02 | x | NaN |
2019-01-03 | NaN | NaN |
................+.....+.....+
baz 2019-01-01 | x | x |
2019-01-02 | x | x |
2019-01-03 | x | x |
用pandas和python做到这一点的最佳方法是什么?我想有一种比遍历数据更好的方法...
What's the best way to do this with pandas and python? I suppose there is a better way than looping through the data...
推荐答案
groupby.transform
, notna()
& any()
我们可以在您的第一级索引上groupby
,然后检查A列中的任何值是否不是NaN
.
groupby.transform
, notna()
& any()
We can groupby
on your first level index and then check if any of the values in column A are not NaN
.
我们使用transform
返回相同形状的布尔数组,因此我们可以使用
We use transform
to get the same shaped boolean array back so we can use boolean indexing
to filter out the correct rows.
m = df['A'].notna().groupby(level=0).transform('any')
df[m]
A B
idx idx2
foo 2019-01-01 x NaN
2019-01-02 x NaN
2019-01-03 NaN NaN
baz 2019-01-01 x x
2019-01-02 x x
2019-01-03 x x
m
返回什么?
What does m
return?
m = df['A'].notna().groupby(level=0).transform('any')
print(m)
idx idx2
foo 2019-01-01 True
2019-01-02 True
2019-01-03 True
bar 2019-01-01 False
2019-01-02 False
2019-01-03 False
baz 2019-01-01 True
2019-01-02 True
2019-01-03 True
Name: A, dtype: bool
这篇关于 pandas :当列中的所有数据均为NaN时,从多级索引中删除索引条目(及其所有行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!