仅适用于所有先前值的Pandas中的条件运行总和 [英] Conditional Running Sum in Pandas for All Previous Values Only

查看：105 发布时间：2020/5/24 3:45:32 python pandas grouping cumulative-sum

本文介绍了仅适用于所有先前值的Pandas中的条件运行总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我具有以下DataFrame:

Suppose I have the following DataFrame:

df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 
                   'Date': ['2019-01-01', '2019-02-01', '2019-03-01', '2019-03-01', '2019-02-15', 
                             '2019-03-15', '2019-04-05', '2019-04-05', '2019-04-15', '2019-06-10'],
                   'Sale': [100, 200, 150, 200, 150, 100, 300, 250, 500, 400]})
df['Date'] = pd.to_datetime(df['Date'])
df

Event         Date  Sale
    A   2019-01-01   100
    B   2019-02-01   200
    A   2019-03-01   150
    A   2019-03-01   200
    B   2019-02-15   150
    C   2019-03-15   100
    B   2019-04-05   300
    B   2019-04-05   250
    A   2019-04-15   500
    C   2019-06-10   400

我想获得以下结果:

Event         Date  Sale   Total_Previous_Sale
    A   2019-01-01   100                     0
    B   2019-02-01   200                     0
    A   2019-03-01   150                   100
    A   2019-03-01   200                   100
    B   2019-02-15   150                   200
    C   2019-03-15   100                     0
    B   2019-04-05   300                   350
    B   2019-04-05   250                   350
    A   2019-04-15   500                   450
    C   2019-06-10   400                   100

其中，df['Total_Previous_Sale']是事件(df['Event'])在其相邻日期(df['Date'])之前发生的总销售额(df['Sale']).例如，

where df['Total_Previous_Sale'] is the total amount of sale (df['Sale']) when the event (df['Event']) takes place before its adjacent date (df['Date']). For instance,

事件A的总销售发生在2019年1月1日为
事件A的总销售金额发生在2019-03-01之前，为100，并且
事件A的总销售金额发生在2019-04-15之前，为100 + 150 + 200 = 450.

基本上，它与条件累积总和几乎相同，但仅适用于所有以前的值(当前值除外).我可以使用此行获得所需的结果:

Basically, it is almost the same like conditional cumulative sum but only for all previous values (excluding current value[s]). I am able to obtain the desired result using this line:

df['Sale_Total'] = [df.loc[(df['Event'] == df.loc[i, 'Event']) & (df['Date'] < df.loc[i, 'Date']), 
                           'Sale'].sum() for i in range(len(df))]

虽然速度很慢，但是效果很好.我相信有一个更好，更快的方法可以做到这一点.我已经尝试过这些行:

Although, it is slow but it works fine. I believe there is a better and faster way to do that. I have tried these lines:

df['Total_Previuos_Sale'] = df[df['Date'] < df['Date']].groupby(['Event'])['Sale'].cumsum()

或

df['Total_Previuos_Sale'] = df.groupby(['Event'])['Sale'].shift(1).cumsum().fillna(0)

但它会产生NaN或产生不良结果.

but it produces NaNs or comes up with an unwanted result.

仅适用于所有先前值的Pandas中的条件运行总和 [英] Conditional Running Sum in Pandas for All Previous Values Only

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

仅适用于所有先前值的Pandas中的条件运行总和 [英] Conditional Running Sum in Pandas for All Previous Values Only

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭