计算时间序列python中事件的持续时间 [英] Calculating the duration an event in a time series python
问题描述
我有一个如下所示的数据框:
I have a dataframe as show below:
index value
2003-01-01 00:00:00 14.5
2003-01-01 01:00:00 15.8
2003-01-01 02:00:00 0
2003-01-01 03:00:00 0
2003-01-01 04:00:00 13.6
2003-01-01 05:00:00 4.3
2003-01-01 06:00:00 13.7
2003-01-01 07:00:00 14.4
2003-01-01 08:00:00 0
2003-01-01 09:00:00 0
2003-01-01 10:00:00 0
2003-01-01 11:00:00 17.2
2003-01-01 12:00:00 0
2003-01-01 13:00:00 5.3
2003-01-01 14:00:00 0
2003-01-01 15:00:00 2.0
2003-01-01 16:00:00 4.0
2003-01-01 17:00:00 0
2003-01-01 18:00:00 0
2003-01-01 19:00:00 3.9
2003-01-01 20:00:00 7.2
2003-01-01 21:00:00 1.0
2003-01-01 22:00:00 1.0
2003-01-01 23:00:00 10.0
索引是日期时间,并有列记录每小时的降雨量(单位:mm),我想计算平均湿sp持续时间,这意味着一天中存在值(不为零)的连续小时平均
,因此计算方式为
The index is datetime and have column record the rainfall value(unit:mm) in each hour,I would like to calculate the "Average wet spell duration", which means the average of continuous hours that exist values (not zero) in a day, so the calculation is
2 + 4 + 1 + 1 + 2 + 5 / 6 (events) = 2.5 (hr)
和平均湿拼写数量,即一天中连续几个小时的总和。
and the "average wet spell amount", which means the average of sum of the values in continuous hours in a day.
{ (14.5 + 15.8) + ( 13.6 + 4.3 + 13.7 + 14.4 ) + (17.2) + (5.3) + (2 + 4)+ (3.9 + 7.2 + 1 + 1 + 10) } / 6 (events) = 21.32 (mm)
上面的datafame只是一个例子,我拥有更多的dataframe较长的时间序列(例如,超过一年),如何编写函数,以便可以更好地计算上述两个值?
The datafame above is just a example, the dataframe which I have have more longer time series (more than one year for example), how can I write a function so it could calculate the two value mentioned above in a better way? thanks in advance!
P.S。值可能是NaN,我只想忽略它。
P.S. the values may be NaN, and I would like to just ignore it.
推荐答案
我相信这就是您想要的。我已经为每个步骤的代码添加了解释。
I believe this is what you are looking for. I have added explanations to the code for each step.
# create helper columns defining contiguous blocks and day
df['block'] = (df['value'].astype(bool).shift() != df['value'].astype(bool)).cumsum()
df['day'] = df['index'].dt.normalize()
# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()
# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)
# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']
结果
day sessions duration value amount
0 2003-01-01 6 2.5 127.9 21.316667
这篇关于计算时间序列python中事件的持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!