如何在 pandas 中重复出现重复分组 [英] How to groupby with consecutive occurrence of duplicates in pandas

查看:88
本文介绍了如何在 pandas 中重复出现重复分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列[Name,In.cl]的数据框。我想按名称分组,但它基于连续出现。例如,在DataFrame下面考虑

I have a dataframe which contains two columns [Name,In.cl]. I want to groupby Name but it based on continuous occurrence. For example consider below DataFrame,

在DF以下生成的代码:

Code to generate below DF:

df=pd.DataFrame({'Name':['A','B','B','A','A','B','C','C','C','B','C'],'In.Cl':[2,1,5,2,4,2,3,1,8,5,7]})

输入:

    In.Cl Name
0       2    A
1       1    B
2       5    B
3       2    A
4       4    A
5       2    B
6       3    C
7       1    C
8       8    C
9       5    B
10      7    C

我想对连续重复的行进行分组。示例组[B](1,2),[A](3,4),[C](6,8)等,并在In.cl列中执行求和运算。

I want to group the rows where it repeated consecutively. example group [B] (1,2), [A] (3,4), [C] (6,8) etc., and perform sum operation in In.cl column.

预期输出:

    In.Cl Name col1   col2
0       2    A   A(1)    2
1       1    B   B(2)    6
2       5    B   B(2)    6
3       2    A   A(2)    6
4       4    A   A(2)    6
5       2    B   B(1)    2
6       3    C   C(3)   12
7       1    C   C(3)   12
8       8    C   C(3)   12
9       5    B   B(1)    5
10      7    C   C(1)    7

到目前为止,我尝试过将重复项和groupby组合使用没有按我预期的那样工作。我想我需要一些分组+连续的东西。但是我没有解决这个问题的想法。

So far i tried combination of duplicate and groupby, it didn't work as i expected. I think I need some thing groupby + consecutive. but i don't have an idea to solve this problem.

任何帮助将不胜感激。

Any help would be appreciated.

推荐答案

In [37]: g = df.groupby((df.Name != df.Name.shift()).cumsum())

In [38]: df['col1'] = df['Name'] + '(' + g['In.Cl'].transform('size').astype(str) + ')'

In [39]: df['col2'] = g['In.Cl'].transform('sum')

In [40]: df
Out[40]:
   Name  In.Cl  col1  col2
0     A      2  A(1)     2
1     B      1  B(2)     6
2     B      5  B(2)     6
3     A      2  A(2)     6
4     A      4  A(2)     6
5     B      2  B(1)     2
6     C      3  C(3)    12
7     C      1  C(3)    12
8     C      8  C(3)    12
9     B      5  B(1)     5
10    C      7  C(1)     7

这篇关于如何在 pandas 中重复出现重复分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆