根据条件 pandas 重置累计和 [英] Reset Cumulative sum base on condition Pandas
问题描述
我有一个数据框,例如:
I have a data frame like:
customer spend hurdle
A 20 50
A 31 50
A 20 50
B 50 100
B 51 100
B 30 100
我想为累积量"计算额外的列,当累积量"总和大于或等于障碍时,该列将基于同一客户重置,如下所示:
I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following :
customer spend hurdle Cumulative
A 20 50 20
A 31 50 51
A 20 50 20
B 50 100 50
B 51 100 101
B 30 100 30
我在熊猫中使用了cumsum
和groupby
,但是我不知道如何根据情况将其重置.
I used the cumsum
and groupby
in pandas to but I do not know how to reset it base on the condition.
以下是我当前正在使用的代码:
Following are the code I am currently using:
df1['cum_sum'] = df1.groupby(['customer'])['spend'].apply(lambda x: x.cumsum())
我知道这只是一个正常的累加和.非常感谢您的帮助.
which I know it is just a normal cumulative sum. I very appreciate for your help.
推荐答案
可能有更快,更有效的方法.这是一种低效的apply
方法.
There could be faster, efficient way. Here's one inefficient apply
way to do would be.
In [3270]: def custcum(x):
...: total = 0
...: for i, v in x.iterrows():
...: total += v.spend
...: x.loc[i, 'cum'] = total
...: if total >= v.hurdle:
...: total = 0
...: return x
...:
In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
customer spend hurdle cum
0 A 20 50 20.0
1 A 31 50 51.0
2 A 20 50 20.0
3 B 50 100 50.0
4 B 51 100 101.0
5 B 30 100 30.0
您可以考虑使用cython
或numba
加快custcum
You may consider using cython
or numba
to speed up the custcum
[更新]
Ido s 答案的改进版本.
In [3276]: s = df.groupby('customer').spend.cumsum()
In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20, 51, 20, 50, 101, 30], dtype=int64)
这篇关于根据条件 pandas 重置累计和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!