pandas 按暨保持列分组 [英] Pandas group by cumsum keep columns
问题描述
我现在已经花了几个小时来尝试对熊猫数据框进行总和累积".我查看了所有的stackoverflow答案,令人惊讶的是,它们都无法解决我的(非常基础的)问题:
I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:
我有一个数据框:
df1
Out[8]:
Name Date Amount
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 8
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
df1
Out[8]:
Name Date Amount
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 8
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
我正在尝试
- 按['名称','日期']和 分组
- 金额"总和.
- 就是这样.
所以所需的输出是:
df1
Out[10]:
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
df1
Out[10]:
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
我正在简化这个问题.用当前的答案,我仍然无法获得正确的运行中"的总和.仔细观察,我想看到累积总和"10、23、10、15".换句话说,我想在每个连续的日期查看一个人的累计总金额.注意:如果同一个人在同一日期有两个条目,我想对它们进行求和,然后将它们添加到正在运行的总和中,然后才打印总和.
I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.
推荐答案
You need assign output to new column and then remove Amount
column by drop
:
df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
使用 assign
的另一种解决方案:
Another solution with assign
:
df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
通过评论
首先是groupby
列Name
和Date
并汇总sum
,然后是groupby
,依次是level
Name
和汇总cumsum
.
First groupby
columns Name
and Date
and aggregate sum
, then groupby
by level
Name
and aggregate cumsum
.
df = df1.groupby(by=['Name','Date'])['Amount'].sum()
.groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
这篇关于 pandas 按暨保持列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!