Pandas groupby 累计总和 [英] Pandas groupby cumulative sum
本文介绍了Pandas groupby 累计总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在我的 Pandas 数据框中添加一个累积总和列,以便:
I would like to add a cumulative sum column to my Pandas dataframe so that:
name | day | no
-----|-----------|----
Jack | Monday | 10
Jack | Tuesday | 20
Jack | Tuesday | 10
Jack | Wednesday | 50
Jill | Monday | 40
Jill | Wednesday | 110
变成:
Jack | Monday | 10 | 10
Jack | Tuesday | 30 | 40
Jack | Wednesday | 50 | 90
Jill | Monday | 40 | 40
Jill | Wednesday | 110 | 150
我尝试了 df.groupby
和 df.agg(lambda x: cumsum(x))
的各种组合都无济于事.
I tried various combos of df.groupby
and df.agg(lambda x: cumsum(x))
to no avail.
推荐答案
这个应该可以,需要groupby()
两次:
This should do it, need groupby()
twice:
df.groupby(['name', 'day']).sum()
.groupby(level=0).cumsum().reset_index()
说明:
print(df)
name day no
0 Jack Monday 10
1 Jack Tuesday 20
2 Jack Tuesday 10
3 Jack Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
# sum per name/day
print( df.groupby(['name', 'day']).sum() )
no
name day
Jack Monday 10
Tuesday 30
Wednesday 50
Jill Monday 40
Wednesday 110
# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum()
.groupby(level=0).cumsum() )
no
name day
Jack Monday 10
Tuesday 40
Wednesday 90
Jill Monday 40
Wednesday 150
由第一个总和产生的数据帧由 'name'
和 'day'
索引.你可以通过打印看到它
The dataframe resulting from the first sum is indexed by 'name'
and by 'day'
. You can see it by printing
df.groupby(['name', 'day']).sum().index
在计算累积和时,您希望通过'name'
进行计算,对应于第一个索引(0级).
When computing the cumulative sum, you want to do so by 'name'
, corresponding to the first index (level 0).
最后,使用 reset_index
使名称重复.
Finally, use reset_index
to have the names repeated.
df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()
name day no
0 Jack Monday 10
1 Jack Tuesday 40
2 Jack Wednesday 90
3 Jill Monday 40
4 Jill Wednesday 150
这篇关于Pandas groupby 累计总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文