根据条件 pandas 重置累计和 [英] Reset Cumulative sum base on condition Pandas

查看:131
本文介绍了根据条件 pandas 重置累计和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,例如:

I have a data frame like:

customer spend hurdle 
A         20    50      
A         31    50      
A         20    50      
B         50    100     
B         51    100    
B         30    100     

我想为累积量"计算额外的列,当累积量"总和大于或等于障碍时,该列将基于同一客户重置,如下所示:

I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following :

customer spend hurdle Cumulative 
A         20    50      20
A         31    50      51
A         20    50      20
B         50    100     50
B         51    100    101
B         30    100     30

我在熊猫中使用了cumsumgroupby,但是我不知道如何根据情况将其重置.

I used the cumsum and groupby in pandas to but I do not know how to reset it base on the condition.

以下是我当前正在使用的代码:

Following are the code I am currently using:

df1['cum_sum'] = df1.groupby(['customer'])['spend'].apply(lambda x: x.cumsum())

我知道这只是一个正常的累加和.非常感谢您的帮助.

which I know it is just a normal cumulative sum. I very appreciate for your help.

推荐答案

可能有更快,更有效的方法.这是一种低效的apply方法.

There could be faster, efficient way. Here's one inefficient apply way to do would be.

In [3270]: def custcum(x):
      ...:     total = 0
      ...:     for i, v in x.iterrows():
      ...:         total += v.spend
      ...:         x.loc[i, 'cum'] = total
      ...:         if total >= v.hurdle:
      ...:            total = 0
      ...:     return x
      ...:

In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
  customer  spend  hurdle    cum
0        A     20      50   20.0
1        A     31      50   51.0
2        A     20      50   20.0
3        B     50     100   50.0
4        B     51     100  101.0
5        B     30     100   30.0

您可以考虑使用cythonnumba加快custcum

You may consider using cython or numba to speed up the custcum

[更新]

Ido s 答案的改进版本.

In [3276]: s = df.groupby('customer').spend.cumsum()

In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20,  51,  20,  50, 101,  30], dtype=int64)

这篇关于根据条件 pandas 重置累计和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆