pandas 按暨保持列分组 [英] Pandas group by cumsum keep columns

查看:69
本文介绍了 pandas 按暨保持列分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在已经花了几个小时来尝试对熊猫数据框进行总和累积".我查看了所有的stackoverflow答案,令人惊讶的是,它们都无法解决我的(非常基础的)问题:

I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:

我有一个数据框:

df1 Out[8]: Name Date Amount 0 Jack 2016-01-31 10 1 Jack 2016-02-29 5 2 Jack 2016-02-29 8 3 Jill 2016-01-31 10 4 Jill 2016-02-29 5

df1 Out[8]: Name Date Amount 0 Jack 2016-01-31 10 1 Jack 2016-02-29 5 2 Jack 2016-02-29 8 3 Jill 2016-01-31 10 4 Jill 2016-02-29 5

我正在尝试

  1. 按['名称','日期']和
  2. 分组
  3. 金额"总和.
  4. 就是这样.

所以所需的输出是:

df1 Out[10]: Name Date Cumsum 0 Jack 2016-01-31 10 1 Jack 2016-02-29 23 2 Jill 2016-01-31 10 3 Jill 2016-02-29 15

df1 Out[10]: Name Date Cumsum 0 Jack 2016-01-31 10 1 Jack 2016-02-29 23 2 Jill 2016-01-31 10 3 Jill 2016-02-29 15

我正在简化这个问题.用当前的答案,我仍然无法获得正确的运行中"的总和.仔细观察,我想看到累积总和"10、23、10、15".换句话说,我想在每个连续的日期查看一个人的累计总金额.注意:如果同一个人在同一日期有两个条目,我想对它们进行求和,然后将它们添加到正在运行的总和中,然后才打印总和.

I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.

推荐答案

您需要将输出分配给新列,然后通过

You need assign output to new column and then remove Amount column by drop:

df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

使用 assign 的另一种解决方案:

Another solution with assign:

df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
         .drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

通过评论

首先是groupbyNameDate并汇总sum,然后是groupby,依次是level Name和汇总cumsum.

First groupby columns Name and Date and aggregate sum, then groupby by level Name and aggregate cumsum.

df = df1.groupby(by=['Name','Date'])['Amount'].sum()
        .groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29      23
2  Jill  2016-01-31      10
3  Jill  2016-02-29      15

这篇关于 pandas 按暨保持列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆