Pandas groupby 自定义组 [英] Pandas groupby custom groups

查看:102
本文介绍了Pandas groupby 自定义组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个这样的数据框:

Let's say I have a dataframe like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']})
print(df)

   A  B
0  1  a
1  2  a
2  3  b
3  4  b
4  5  c
5  6  c

我如何按 col B 分组,使得组为 aa OR ba OR b OR c,而不仅仅是 abc?为了这个例子,假设我想按 'sum' 聚合结果.然后我会得到:

How can I group by col B such that the groups are a, a OR b and a OR b OR c, rather than just a, b and c? For the sake of the example, let's say that I want to aggregate the results by 'sum'. I would then end up with:

              A
a             3
a OR b        10 
a OR b OR c   21

推荐答案

我认为这真的取决于你想要使用的功能.我可以想到一个技巧 DataFrame.expanding 例如,如果你想计算sum.这个想法是我们可以利用扩展,然后只考虑整个组的行已通过 Series.where 选择

I think it really depends on the function you want to use. I can think of a trick with DataFrame.expanding for example if you want to calculate the sum.The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
      A
0   NaN
1   3.0
2   NaN
3  10.0
4   NaN
5  21.0

<小时>

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]

      A
1   3.0
3  10.0
5  21.0

更新

我们也可以使用DataFrame.groupby + DataFrame.expanding

We can also use DataFrame.groupby + DataFrame.expanding

df.groupby('B').sum().expanding().sum()

要获得预期的输出:

new_df = (df.groupby('B').sum().expanding().sum()
            .reset_index()
            .assign(B = lambda x: x.B.add(' or ').cumsum()
                                  .str.rstrip(' or '))
            .set_index('B') )
print(new_df)
                A
B                
a             3.0
a or b       10.0
a or b or c  21.0

这篇关于Pandas groupby 自定义组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆