大 pandas 通过重置获得累积总和 [英] Pandas taking Cumulative Sum with Reset

查看:102
本文介绍了大 pandas 通过重置获得累积总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试保持连续时间戳(分钟频率)的总计运行.我目前有一种方法来获取累积的总和,并在两列不匹配的情况下将其重置,但是可以使用for循环来完成.我想知道是否有没有办法做到这一点的循环.

I'm trying to keep a running total of consecutive timestamps (minute frequency). I currently have a way of taking a cumulative sum and resetting it on the condition that two columns do not match, but its done with a for loop. I was wondering if there is a way to do this without the loop.

cb_arbitrage['shift'] = cb_arbitrage.index.shift(1, freq='T')

返回:

                        cccccccc     bbbbbbbb  cb_spread         shift
timestamp                                                                   
2017-07-07 18:23:00  2535.002000  2524.678462  10.323538 2017-07-07 18:24:00
2017-07-07 18:24:00  2535.007826  2523.297619  11.710207 2017-07-07 18:25:00
2017-07-07 18:25:00  2535.004167  2524.391000  10.613167 2017-07-07 18:26:00
2017-07-07 18:26:00  2534.300000  2521.838667  12.461333 2017-07-07 18:27:00
2017-07-07 18:27:00  2530.231429  2520.195625  10.035804 2017-07-07 18:28:00
2017-07-07 18:28:00  2529.444667  2518.782143  10.662524 2017-07-07 18:29:00
2017-07-07 18:29:00  2528.988000  2518.802963  10.185037 2017-07-07 18:30:00
2017-07-07 18:59:00  2514.403367  2526.473333  12.069966 2017-07-07 19:00:00
2017-07-07 19:01:00  2516.410000  2528.980000  12.570000 2017-07-07 19:02:00

然后我执行以下操作:

cb_arbitrage['shift'] = cb_arbitrage['shift'].shift(1)
cb_arbitrage['shift'][0] = cb_arbitrage.index[0]
cb_arbitrage['count'] = 0

哪个返回:

                        cccccccc     bbbbbbbb  cb_spread               shift  count
timestamp                                                                          
2017-07-07 18:23:00  2535.002000  2524.678462  10.323538 2017-07-07 18:23:00      0
2017-07-07 18:24:00  2535.007826  2523.297619  11.710207 2017-07-07 18:24:00      0
2017-07-07 18:25:00  2535.004167  2524.391000  10.613167 2017-07-07 18:25:00      0
2017-07-07 18:26:00  2534.300000  2521.838667  12.461333 2017-07-07 18:26:00      0
2017-07-07 18:27:00  2530.231429  2520.195625  10.035804 2017-07-07 18:27:00      0
2017-07-07 18:28:00  2529.444667  2518.782143  10.662524 2017-07-07 18:28:00      0
2017-07-07 18:29:00  2528.988000  2518.802963  10.185037 2017-07-07 18:29:00      0
2017-07-07 18:59:00  2514.403367  2526.473333  12.069966 2017-07-07 18:30:00      0
2017-07-07 19:01:00  2516.410000  2528.980000  12.570000 2017-07-07 19:00:00      0

然后,使用重置来计算累积总和的循环:

Then, the loop to calculate the cumulative sum, with reset:

count = 0
for i, row in cb_arbitrage.iterrows():

    if i == cb_arbitrage.loc[i]['shift']:
        count += 1
        cb_arbitrage.set_value(i, 'count', count)
    else:
        count = 1
        cb_arbitrage.set_value(i, 'count', count)

这给了我我期望的结果:

Which gives me my expected result:

                        cccccccc     bbbbbbbb  cb_spread               shift  count
timestamp                                                                          
2017-07-07 18:23:00  2535.002000  2524.678462  10.323538 2017-07-07 18:23:00      1
2017-07-07 18:24:00  2535.007826  2523.297619  11.710207 2017-07-07 18:24:00      2
2017-07-07 18:25:00  2535.004167  2524.391000  10.613167 2017-07-07 18:25:00      3
2017-07-07 18:26:00  2534.300000  2521.838667  12.461333 2017-07-07 18:26:00      4
2017-07-07 18:27:00  2530.231429  2520.195625  10.035804 2017-07-07 18:27:00      5
2017-07-07 18:28:00  2529.444667  2518.782143  10.662524 2017-07-07 18:28:00      6
2017-07-07 18:29:00  2528.988000  2518.802963  10.185037 2017-07-07 18:29:00      7
2017-07-07 18:59:00  2514.403367  2526.473333  12.069966 2017-07-07 18:30:00      1
2017-07-07 19:01:00  2516.410000  2528.980000  12.570000 2017-07-07 19:00:00      1
2017-07-07 21:55:00  2499.904560  2510.814000  10.909440 2017-07-07 19:02:00      1
2017-07-07 21:56:00  2500.134615  2510.812857  10.678242 2017-07-07 21:56:00      2

推荐答案

您可以使用diff方法来查找当前行和上一行之间的差异.然后,您可以检查该差异是否等于一分钟.从这里开始,有很多技巧可以重置数据中的条纹.

You can use the diff method which finds the difference between the current row and previous row. You can then check and see if this difference is equal to one minute. From here, there is lots of trickery to reset streaks within data.

我们首先获取布尔级数的累积和,这使我们接近所需的值.要重置序列,请将该累积和序列乘以原始布尔值,因为False的值为0.

We first take the cumulative sum of the boolean Series, which gets us close to what we want. To reset the series we multiply this cumulative sum series by the original boolean, since False evaluates as 0.

s = cb_arbitrage.timestamp.diff() == pd.Timedelta('1 minute')
s1 = s.cumsum()
s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1, fill_value=0) + 1

0     1.0
1     2.0
2     3.0
3     4.0
4     5.0
5     6.0
6     7.0
7     1.0
8     1.0
9     1.0
10    2.0

这篇关于大 pandas 通过重置获得累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆