Pandas group by cumsum 保留列 [英] Pandas group by cumsum keep columns

查看：41 发布时间：2021/12/27 8:09:25 pandas group-by cumsum

本文介绍了Pandas group by cumsum 保留列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我现在花了几个小时试图在 Pandas 数据框上做一个按总和累积分组".我已经查看了所有 stackoverflow 的答案，令人惊讶的是，它们都不能解决我的(非常基本的)问题:

I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:

我有一个数据框:

<代码>df1出[8]:姓名日期金额0 杰克 2016-01-31 101 杰克 2016-02-29 52 杰克 2016-02-29 83 吉尔 2016-01-31 104 吉尔 2016-02-29 5

我正在努力

按 ['Name','Date'] 和
累计金额".
就是这样.

所以期望的输出是:

<代码>df1出[10]:姓名日期累计0 杰克 2016-01-31 101 杰克 2016-02-29 232 吉尔 2016-01-31 103 吉尔 2016-02-29 15

我正在简化问题.使用当前的答案，我仍然无法获得正确的运行"cumsum.仔细看，我想看到累积和10,23,10,15".换句话说，我想在每个连续的日期看到一个人的总累计金额.注意:如果同一个人在一个日期有两个条目，我想将它们相加，然后将它们添加到正在运行的 cumsum 中，然后才打印总和.

I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.

推荐答案

您需要将输出分配给新列，然后通过 drop:

You need assign output to new column and then remove Amount column by drop:

df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

另一种使用 assign:

Another solution with assign:

df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
         .drop('Amount', axis=1)
print (df1)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29       5
2  Jack  2016-02-29      13
3  Jill  2016-01-31      10
4  Jill  2016-02-29       5

通过评论

首先groupby列Name和Date并聚合sum，然后groupby按level Name 并聚合cumsum.

First groupby columns Name and Date and aggregate sum, then groupby by level Name and aggregate cumsum.

df = df1.groupby(by=['Name','Date'])['Amount'].sum()
        .groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
   Name        Date  Cumsum
0  Jack  2016-01-31      10
1  Jack  2016-02-29      23
2  Jill  2016-01-31      10
3  Jill  2016-02-29      15

这篇关于Pandas group by cumsum 保留列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas group by cumsum 保留列 [英] Pandas group by cumsum keep columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas group by cumsum 保留列 [英] Pandas group by cumsum keep columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭