大 pandas 通过重置获得累积总和 [英] Pandas taking Cumulative Sum with Reset
问题描述
我正在尝试保持连续时间戳(分钟频率)的总计运行.我目前有一种方法来获取累积的总和,并在两列不匹配的情况下将其重置,但是可以使用for循环来完成.我想知道是否有没有办法做到这一点的循环.
I'm trying to keep a running total of consecutive timestamps (minute frequency). I currently have a way of taking a cumulative sum and resetting it on the condition that two columns do not match, but its done with a for loop. I was wondering if there is a way to do this without the loop.
cb_arbitrage['shift'] = cb_arbitrage.index.shift(1, freq='T')
返回:
cccccccc bbbbbbbb cb_spread shift
timestamp
2017-07-07 18:23:00 2535.002000 2524.678462 10.323538 2017-07-07 18:24:00
2017-07-07 18:24:00 2535.007826 2523.297619 11.710207 2017-07-07 18:25:00
2017-07-07 18:25:00 2535.004167 2524.391000 10.613167 2017-07-07 18:26:00
2017-07-07 18:26:00 2534.300000 2521.838667 12.461333 2017-07-07 18:27:00
2017-07-07 18:27:00 2530.231429 2520.195625 10.035804 2017-07-07 18:28:00
2017-07-07 18:28:00 2529.444667 2518.782143 10.662524 2017-07-07 18:29:00
2017-07-07 18:29:00 2528.988000 2518.802963 10.185037 2017-07-07 18:30:00
2017-07-07 18:59:00 2514.403367 2526.473333 12.069966 2017-07-07 19:00:00
2017-07-07 19:01:00 2516.410000 2528.980000 12.570000 2017-07-07 19:02:00
然后我执行以下操作:
cb_arbitrage['shift'] = cb_arbitrage['shift'].shift(1)
cb_arbitrage['shift'][0] = cb_arbitrage.index[0]
cb_arbitrage['count'] = 0
哪个返回:
cccccccc bbbbbbbb cb_spread shift count
timestamp
2017-07-07 18:23:00 2535.002000 2524.678462 10.323538 2017-07-07 18:23:00 0
2017-07-07 18:24:00 2535.007826 2523.297619 11.710207 2017-07-07 18:24:00 0
2017-07-07 18:25:00 2535.004167 2524.391000 10.613167 2017-07-07 18:25:00 0
2017-07-07 18:26:00 2534.300000 2521.838667 12.461333 2017-07-07 18:26:00 0
2017-07-07 18:27:00 2530.231429 2520.195625 10.035804 2017-07-07 18:27:00 0
2017-07-07 18:28:00 2529.444667 2518.782143 10.662524 2017-07-07 18:28:00 0
2017-07-07 18:29:00 2528.988000 2518.802963 10.185037 2017-07-07 18:29:00 0
2017-07-07 18:59:00 2514.403367 2526.473333 12.069966 2017-07-07 18:30:00 0
2017-07-07 19:01:00 2516.410000 2528.980000 12.570000 2017-07-07 19:00:00 0
然后,使用重置来计算累积总和的循环:
Then, the loop to calculate the cumulative sum, with reset:
count = 0
for i, row in cb_arbitrage.iterrows():
if i == cb_arbitrage.loc[i]['shift']:
count += 1
cb_arbitrage.set_value(i, 'count', count)
else:
count = 1
cb_arbitrage.set_value(i, 'count', count)
这给了我我期望的结果:
Which gives me my expected result:
cccccccc bbbbbbbb cb_spread shift count
timestamp
2017-07-07 18:23:00 2535.002000 2524.678462 10.323538 2017-07-07 18:23:00 1
2017-07-07 18:24:00 2535.007826 2523.297619 11.710207 2017-07-07 18:24:00 2
2017-07-07 18:25:00 2535.004167 2524.391000 10.613167 2017-07-07 18:25:00 3
2017-07-07 18:26:00 2534.300000 2521.838667 12.461333 2017-07-07 18:26:00 4
2017-07-07 18:27:00 2530.231429 2520.195625 10.035804 2017-07-07 18:27:00 5
2017-07-07 18:28:00 2529.444667 2518.782143 10.662524 2017-07-07 18:28:00 6
2017-07-07 18:29:00 2528.988000 2518.802963 10.185037 2017-07-07 18:29:00 7
2017-07-07 18:59:00 2514.403367 2526.473333 12.069966 2017-07-07 18:30:00 1
2017-07-07 19:01:00 2516.410000 2528.980000 12.570000 2017-07-07 19:00:00 1
2017-07-07 21:55:00 2499.904560 2510.814000 10.909440 2017-07-07 19:02:00 1
2017-07-07 21:56:00 2500.134615 2510.812857 10.678242 2017-07-07 21:56:00 2
推荐答案
您可以使用diff
方法来查找当前行和上一行之间的差异.然后,您可以检查该差异是否等于一分钟.从这里开始,有很多技巧可以重置数据中的条纹.
You can use the diff
method which finds the difference between the current row and previous row. You can then check and see if this difference is equal to one minute. From here, there is lots of trickery to reset streaks within data.
我们首先获取布尔级数的累积和,这使我们接近所需的值.要重置序列,请将该累积和序列乘以原始布尔值,因为False的值为0.
We first take the cumulative sum of the boolean Series, which gets us close to what we want. To reset the series we multiply this cumulative sum series by the original boolean, since False evaluates as 0.
s = cb_arbitrage.timestamp.diff() == pd.Timedelta('1 minute')
s1 = s.cumsum()
s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1, fill_value=0) + 1
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 1.0
8 1.0
9 1.0
10 2.0
这篇关于大 pandas 通过重置获得累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!