Pandas group by cumsum 保留列 [英] Pandas group by cumsum keep columns
问题描述
我现在花了几个小时试图在 Pandas 数据框上做一个按总和累积分组".我已经查看了所有 stackoverflow 的答案,令人惊讶的是,它们都不能解决我的(非常基本的)问题:
I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:
我有一个数据框:
<代码>df1出[8]:姓名 日期 金额0 杰克 2016-01-31 101 杰克 2016-02-29 52 杰克 2016-02-29 83 吉尔 2016-01-31 104 吉尔 2016-02-29 5
我正在努力
- 按 ['Name','Date'] 和 分组
- 累计金额".
- 就是这样.
所以期望的输出是:
<代码>df1出[10]:姓名日期累计0 杰克 2016-01-31 101 杰克 2016-02-29 232 吉尔 2016-01-31 103 吉尔 2016-02-29 15
我正在简化问题.使用当前的答案,我仍然无法获得正确的运行"cumsum.仔细看,我想看到累积和10,23,10,15".换句话说,我想在每个连续的日期看到一个人的总累计金额.注意:如果同一个人在一个日期有两个条目,我想将它们相加,然后将它们添加到正在运行的 cumsum 中,然后才打印总和.
I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.
推荐答案
您需要将输出分配给新列,然后通过 drop
:
You need assign output to new column and then remove Amount
column by drop
:
df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
另一种使用 assign
:
Another solution with assign
:
df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
通过评论
首先groupby
列Name
和Date
并聚合sum
,然后groupby
按level
Name
并聚合cumsum
.
First groupby
columns Name
and Date
and aggregate sum
, then groupby
by level
Name
and aggregate cumsum
.
df = df1.groupby(by=['Name','Date'])['Amount'].sum()
.groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
这篇关于Pandas group by cumsum 保留列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!