Pandas group by cumsum 保留列 [英] Pandas group by cumsum keep columns

查看:41
本文介绍了Pandas group by cumsum 保留列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在花了几个小时试图在 Pandas 数据框上做一个按总和累积分组".我已经查看了所有 stackoverflow 的答案,令人惊讶的是,它们都不能解决我的(非常基本的)问题:

I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:

我有一个数据框:

<代码>df1出[8]:姓名 日期 金额0 杰克 2016-01-31 101 杰克 2016-02-29 52 杰克 2016-02-29 83 吉尔 2016-01-31 104 吉尔 2016-02-29 5

我正在努力

  1. 按 ['Name','Date'] 和
  2. 分组
  3. 累计金额".
  4. 就是这样.

所以期望的输出是:

<代码>df1出[10]:姓名日期累计0 杰克 2016-01-31 101 杰克 2016-02-29 232 吉尔 2016-01-31 103 吉尔 2016-02-29 15

我正在简化问题.使用当前的答案,我仍然无法获得正确的运行"cumsum.仔细看,我想看到累积和10,23,10,15".换句话说,我想在每个连续的日期看到一个人的总累计金额.注意:如果同一个人在一个日期有两个条目,我想将它们相加,然后将它们添加到正在运行的 cumsum 中,然后才打印总和.

I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.

推荐答案

您需要将输出分配给新列,然后通过 drop:

You need assign output to new column and then remove Amount column by drop:

df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

另一种使用 assign:

Another solution with assign:

df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
         .drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

通过评论

首先groupbyNameDate并聚合sum,然后groupbylevel Name 并聚合cumsum.

First groupby columns Name and Date and aggregate sum, then groupby by level Name and aggregate cumsum.

df = df1.groupby(by=['Name','Date'])['Amount'].sum()
        .groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29      23
2  Jill  2016-01-31      10
3  Jill  2016-02-29      15

这篇关于Pandas group by cumsum 保留列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆