pandas :当列中的所有数据均为NaN时,从多级索引中删除索引条目(及其所有行) [英] Pandas: Remove index entry (and all it's rows) from multilevel index when all data in a column is NaN

查看:54
本文介绍了 pandas :当列中的所有数据均为NaN时,从多级索引中删除索引条目(及其所有行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想清理具有多级索引的数据框中的一些数据.

I'd like to clean up some data I have in a dataframe with a multilevel index.

                | A   | B   | 
----------------+-----+-----+
foo  2019-01-01 | x   | NaN |
     2019-01-02 | x   | NaN |
     2019-01-03 | NaN | NaN |
................+.....+.....+
bar  2019-01-01 | NaN | x   |
     2019-01-02 | NaN | y   |
     2019-01-03 | NaN | z   |
................+.....+.....+
baz  2019-01-01 | x   | x   |
     2019-01-02 | x   | x   |
     2019-01-03 | x   | x   |

我想删除由 bar 索引的整个组,因为 A 列中的所有数据均为 NaN .我想保留 foo ,因为只有 A 列中的某些数据是 NaN (列 B 即使全部是 NaN ),在这里也不重要.我想保留 baz ,因为并非所有 A 列都是 NaN . 所以我的结果应该像这样:

I'd like to loose the complete group indexed by bar, because all of the data in column A is NaN. I'd like to keep foo, because only some of the data in column A is NaN (column B is not important here, even if it's all NaN). I'd like to keep baz, because not all of column Ais NaN. So my result should look like this:

                | A   | B   | 
----------------+-----+-----+
foo  2019-01-01 | x   | NaN |
     2019-01-02 | x   | NaN |
     2019-01-03 | NaN | NaN |
................+.....+.....+
baz  2019-01-01 | x   | x   |
     2019-01-02 | x   | x   |
     2019-01-03 | x   | x   |

用pandas和python做到这一点的最佳方法是什么?我想有一种比遍历数据更好的方法...

What's the best way to do this with pandas and python? I suppose there is a better way than looping through the data...

推荐答案

groupby.transform notna() & any()

我们可以在您的第一级索引上groupby,然后检查A列中的任何值是否不是NaN.

groupby.transform, notna() & any()

We can groupby on your first level index and then check if any of the values in column A are not NaN.

我们使用transform返回相同形状的布尔数组,因此我们可以使用

We use transform to get the same shaped boolean array back so we can use boolean indexing to filter out the correct rows.

m = df['A'].notna().groupby(level=0).transform('any')
df[m]

                  A    B
idx idx2                
foo 2019-01-01    x  NaN
    2019-01-02    x  NaN
    2019-01-03  NaN  NaN
baz 2019-01-01    x    x
    2019-01-02    x    x
    2019-01-03    x    x


m返回什么?


What does m return?

m = df['A'].notna().groupby(level=0).transform('any')
print(m)

idx  idx2      
foo  2019-01-01     True
     2019-01-02     True
     2019-01-03     True
bar  2019-01-01    False
     2019-01-02    False
     2019-01-03    False
baz  2019-01-01     True
     2019-01-02     True
     2019-01-03     True
Name: A, dtype: bool

这篇关于 pandas :当列中的所有数据均为NaN时,从多级索引中删除索引条目(及其所有行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆