如何在 pandas 中重复出现重复分组 [英] How to groupby with consecutive occurrence of duplicates in pandas
问题描述
我有一个包含两列[Name,In.cl]的数据框。我想按名称分组,但它基于连续出现。例如,在DataFrame下面考虑
I have a dataframe which contains two columns [Name,In.cl]. I want to groupby Name but it based on continuous occurrence. For example consider below DataFrame,
在DF以下生成的代码:
Code to generate below DF:
df=pd.DataFrame({'Name':['A','B','B','A','A','B','C','C','C','B','C'],'In.Cl':[2,1,5,2,4,2,3,1,8,5,7]})
输入:
In.Cl Name
0 2 A
1 1 B
2 5 B
3 2 A
4 4 A
5 2 B
6 3 C
7 1 C
8 8 C
9 5 B
10 7 C
我想对连续重复的行进行分组。示例组[B](1,2),[A](3,4),[C](6,8)等,并在In.cl列中执行求和运算。
I want to group the rows where it repeated consecutively. example group [B] (1,2), [A] (3,4), [C] (6,8) etc., and perform sum operation in In.cl column.
预期输出:
In.Cl Name col1 col2
0 2 A A(1) 2
1 1 B B(2) 6
2 5 B B(2) 6
3 2 A A(2) 6
4 4 A A(2) 6
5 2 B B(1) 2
6 3 C C(3) 12
7 1 C C(3) 12
8 8 C C(3) 12
9 5 B B(1) 5
10 7 C C(1) 7
到目前为止,我尝试过将重复项和groupby组合使用没有按我预期的那样工作。我想我需要一些分组+连续的东西。但是我没有解决这个问题的想法。
So far i tried combination of duplicate and groupby, it didn't work as i expected. I think I need some thing groupby + consecutive. but i don't have an idea to solve this problem.
任何帮助将不胜感激。
Any help would be appreciated.
推荐答案
In [37]: g = df.groupby((df.Name != df.Name.shift()).cumsum())
In [38]: df['col1'] = df['Name'] + '(' + g['In.Cl'].transform('size').astype(str) + ')'
In [39]: df['col2'] = g['In.Cl'].transform('sum')
In [40]: df
Out[40]:
Name In.Cl col1 col2
0 A 2 A(1) 2
1 B 1 B(2) 6
2 B 5 B(2) 6
3 A 2 A(2) 6
4 A 4 A(2) 6
5 B 2 B(1) 2
6 C 3 C(3) 12
7 C 1 C(3) 12
8 C 8 C(3) 12
9 B 5 B(1) 5
10 C 7 C(1) 7
这篇关于如何在 pandas 中重复出现重复分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!