计算 pandas 事件之间的时差 [英] Calculating time difference between events in pandas

查看:102
本文介绍了计算 pandas 事件之间的时差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

+--------------------------------------------------------------+
| 2014-08-12T10:30:14.6938893+10:00     Reading received START |
| 2014-08-12T10:30:14.6938893+10:00       Reading received ADD |
| 2014-08-12T10:30:14.7094893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:14.7094893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:14.7094893+10:00               Commit start |
| 2014-08-12T10:30:14.7406893+10:00                 Commit end |
| 2014-08-12T10:30:14.7406893+10:00    Reading received FINISH |
| 2014-08-12T10:30:23.3206893+10:00     Reading received START |
| 2014-08-12T10:30:23.3206893+10:00       Reading received ADD |
| 2014-08-12T10:30:23.3362893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:23.3362893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:23.3362893+10:00               Commit start |
| 2014-08-12T10:30:23.3674893+10:00                 Commit end |
| 2014-08-12T10:30:23.3674893+10:00    Reading received FINISH |
+--------------------------------------------------------------+

鉴于该值描述一个事件的时间序列,如何计算重复发生的事件之间的时间差,例如阅读已收到START 和随后的阅读已完成之间的平均差异?

Given a time series where the value describes an event, how can I calculate delta times between recurring events, e.g. the average difference between Reading received START and the subsequent Reading received FINISH?

有没有比这更好的方法了?

Is there a better way than then e.g.

left = df[df.Event == 'Reading received START']
right = df[df.Event == 'Reading received FINISH']
left.index = range(len(left))
right.index = range(len(right))
delta = (right.Time - left.Time)

推荐答案

为明确起见,我假设您正在从较大的数据框中显示索引和一列(称为事件").那是对的吗? 怎么样:

To be explicit, I'm assuming that you are showing the index and one column (called 'Event') from a larger dataframe. Is that correct? How about the following:

relevant_df = df[df.Event.isin(['Reading received START','Reading received START'])
relevant_ts_as_series = pd.Series(relevant_df.index)
diff = relevant_ts_as_series - relevant_ts_as_series.shift()

然后,您可以根据需要选择diff.mean().

Then you can take diff.mean() if you like.

我敢打赌,除了将索引转换为系列以外,还有一种更优雅的方法,但这应该对您有用.

I bet there's a more elegant way than turning the index into a Series, but this should work for you.

这篇关于计算 pandas 事件之间的时差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆