不包括当前行的滚动窗口 [英] Rolling windows excluding current rows
本文介绍了不包括当前行的滚动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是一个数据框示例:
days = ['2019-07-04 17:02:03', '2019-07-04 17:03:03',
'2019-07-04 18:04:03', '2019-07-04 19:05:03',
'2019-07-04 21:06:03', '2019-07-04 21:36:03',
'2019-07-04 21:50:03', '2019-07-04 22:10:03']
ddf = pd.DataFrame({'Val': [0, 1, 2, 1, 4,1,3,1],'Cat':["A","A","A","A","B","B","B","B"]},
index= days)
ddf.index = pd.to_datetime(ddf.index)
Val Cat
2019-07-04 17:02:03 0 A
2019-07-04 17:03:03 1 A
2019-07-04 18:04:03 2 A
2019-07-04 19:05:03 1 A
2019-07-04 21:06:03 4 B
2019-07-04 21:36:03 1 B
2019-07-04 21:50:03 3 B
2019-07-04 22:10:03 1 B
如果我应用 1 小时窗口的滚动总和,我会得到这个:
If I apply rolling sum with 1 hour windows I get this:
ddf.groupby("Cat")["Val"].rolling("1h").sum().rename('sum_last_hour')
Cat
A 2019-07-04 17:02:03 0.0
2019-07-04 17:03:03 1.0
2019-07-04 18:04:03 2.0
2019-07-04 19:05:03 1.0
B 2019-07-04 21:06:03 4.0
2019-07-04 21:36:03 5.0
2019-07-04 21:50:03 8.0
2019-07-04 22:10:03 5.0
Name: sum_last_hour, dtype: float64
Name: sum_last_hour, dtype: float64
但我想得到这个:
Cat
A 2019-07-04 17:02:03 NaN
2019-07-04 17:03:03 0.0
2019-07-04 18:04:03 NaN
2019-07-04 19:05:03 NaN
B 2019-07-04 21:06:03 NaN
2019-07-04 21:36:03 4.0
2019-07-04 21:50:03 5.0
2019-07-04 22:10:03 4.0
Name: sum_last_hour, dtype: float64
所以如果有意义的话,我基本上想从滚动总和中排除当前行......我尝试使用 shift() 但目前没有成功.感谢您的帮助!
So I basically want to exclude the current row from the rolling sum if that makes sense... I tried using shift() but without success for now. Thanks for your help!
推荐答案
其实我才知道.您需要在rolling()
函数中使用closed
参数并将其设置为left
.这样的事情给了我很好的结果:
Actually I just found out about it. You need to use the closed
parameter in the rolling()
function and set it to left
. Something like this gives me the good result:
ddf.groupby("Cat").rolling("1h", closed= "left")["Val"].sum().rename('sum_last_hour')
这篇关于不包括当前行的滚动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文