计算时间序列python中事件的持续时间 [英] Calculating the duration an event in a time series python

查看:366
本文介绍了计算时间序列python中事件的持续时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框:

I have a dataframe as show below:

index                value
2003-01-01 00:00:00  14.5
2003-01-01 01:00:00  15.8
2003-01-01 02:00:00     0
2003-01-01 03:00:00     0
2003-01-01 04:00:00  13.6
2003-01-01 05:00:00   4.3
2003-01-01 06:00:00  13.7
2003-01-01 07:00:00  14.4
2003-01-01 08:00:00     0
2003-01-01 09:00:00     0
2003-01-01 10:00:00     0
2003-01-01 11:00:00  17.2
2003-01-01 12:00:00     0
2003-01-01 13:00:00   5.3
2003-01-01 14:00:00     0
2003-01-01 15:00:00   2.0
2003-01-01 16:00:00   4.0
2003-01-01 17:00:00     0
2003-01-01 18:00:00     0
2003-01-01 19:00:00   3.9
2003-01-01 20:00:00   7.2
2003-01-01 21:00:00   1.0
2003-01-01 22:00:00   1.0
2003-01-01 23:00:00  10.0

索引是日期时间,并有列记录每小时的降雨量(单位:mm),我想计算平均湿sp持续时间,这意味着一天中存在值(不为零)的连续小时平均
,因此计算方式为

The index is datetime and have column record the rainfall value(unit:mm) in each hour,I would like to calculate the "Average wet spell duration", which means the average of continuous hours that exist values (not zero) in a day, so the calculation is

2 + 4 + 1 + 1 + 2 + 5 / 6 (events) = 2.5 (hr)

和平均湿拼写数量,即一天中连续几个小时的总和。

and the "average wet spell amount", which means the average of sum of the values in continuous hours in a day.

{ (14.5 + 15.8) + ( 13.6 + 4.3 + 13.7 + 14.4 ) + (17.2) + (5.3) + (2 + 4)+ (3.9 + 7.2 + 1 + 1 + 10) } /  6 (events) = 21.32 (mm)

上面的datafame只是一个例子,我拥有更多的dataframe较长的时间序列(例如,超过一年),如何编写函数,以便可以更好地计算上述两个值?

The datafame above is just a example, the dataframe which I have have more longer time series (more than one year for example), how can I write a function so it could calculate the two value mentioned above in a better way? thanks in advance!

P.S。值可能是NaN,我只想忽略它。

P.S. the values may be NaN, and I would like to just ignore it.

推荐答案

我相信这就是您想要的。我已经为每个步骤的代码添加了解释。

I believe this is what you are looking for. I have added explanations to the code for each step.

# create helper columns defining contiguous blocks and day
df['block'] = (df['value'].astype(bool).shift() != df['value'].astype(bool)).cumsum()
df['day'] = df['index'].dt.normalize()

# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']

结果

         day  sessions  duration  value     amount
0 2003-01-01         6       2.5  127.9  21.316667

这篇关于计算时间序列python中事件的持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆