pandas Groupby累积总和 [英] Pandas groupby cumulative sum
本文介绍了 pandas Groupby累积总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在我的Pandas数据框中添加一个累加的总和列,以便:
I would like to add a cumulative sum column to my Pandas dataframe so that:
name | day | no
-----|-----------|----
Jack | Monday | 10
Jack | Tuesday | 20
Jack | Tuesday | 10
Jack | Wednesday | 50
Jill | Monday | 40
Jill | Wednesday | 110
成为:
Jack | Monday | 10 | 10
Jack | Tuesday | 30 | 40
Jack | Wednesday | 50 | 90
Jill | Monday | 40 | 40
Jill | Wednesday | 110 | 150
我尝试了df.groupby
和df.agg(lambda x: cumsum(x))
的各种组合都无济于事.
I tried various combos of df.groupby
and df.agg(lambda x: cumsum(x))
to no avail.
推荐答案
这应该做到,需要两次groupby()
:
This should do it, need groupby()
twice:
df.groupby(['name', 'day']).sum() \
.groupby(level=0).cumsum().reset_index()
说明:
print(df)
name day no
0 Jack Monday 10
1 Jack Tuesday 20
2 Jack Tuesday 10
3 Jack Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
# sum per name/day
print( df.groupby(['name', 'day']).sum() )
no
name day
Jack Monday 10
Tuesday 30
Wednesday 50
Jill Monday 40
Wednesday 110
# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
.groupby(level=0).cumsum() )
no
name day
Jack Monday 10
Tuesday 40
Wednesday 90
Jill Monday 40
Wednesday 150
由第一个和得出的数据帧由'name'
和'day'
索引.您可以通过打印
The dataframe resulting from the first sum is indexed by 'name'
and by 'day'
. You can see it by printing
df.groupby(['name', 'day']).sum().index
在计算累积总和时,您要通过'name'
进行计算,该值对应于第一个索引(级别0).
When computing the cumulative sum, you want to do so by 'name'
, corresponding to the first index (level 0).
最后,使用reset_index
重复命名.
df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()
name day no
0 Jack Monday 10
1 Jack Tuesday 40
2 Jack Wednesday 90
3 Jill Monday 40
4 Jill Wednesday 150
这篇关于 pandas Groupby累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文