查找一天中事件的开始时间和结束时间 - Pandas 时间序列 - 这样结束时间就不会落在第二天 [英] Finding start-time and end-time of events in a day - Pandas timeseries - such that end time does not fall into next day

查看：89 发布时间：2021/6/13 20:55:41 python pandas dataframe time-series python-datetime

本文介绍了查找一天中事件的开始时间和结束时间 - Pandas 时间序列 - 这样结束时间就不会落在第二天的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个气象时间序列df:

df = pd.DataFrame({'date':['11/10/2017 0:00','11/10/2017 03:00','11/10/2017 06:00','11/10/2017 09:00','11/10/2017 12:00','11/11/2017 0:00','11/11/2017 03:00','11/11/2017 06:00','11/11/2017 09:00','11/11/201712:00','11/12/2017 00:00','11/12/2017 03:00','11/12/2017 06:00','11/12/2017 09:00','11/12/201712:00'],'值':[850,np.nan,np.nan,np.nan,np.nan,500,650,780,np.nan,800,350,690,780,np.nan,np.nan]})df['date'] = pd.to_datetime(df.date.astype(str), format='%m/%d/%Y %H:%M',errors='coerce')df.index = pd.DatetimeIndex(df.date)

通过这个数据框，我试图找出事件的开始时间和结束时间:

(df[value"] <1000)

我使用了类似于

所需的输出与上面的输出不同:

EndTime 在第二行(索引 1)为 2017-11-10 13:30:00
EndTime 第五行(索引 4 )为 2017-11-11 13:30:00
新行第六行(索引 5)和第六行

逻辑:

由于时间戳相隔 3 小时，因此假设事件在时间戳之前 1 小时 30 分钟开始并在时间戳之后 1 小时 30 分钟结束.
如果两个连续事件相似，则它们相加如下:第一个时间戳前 1 小时 30 分钟到第二个时间戳后 1 小时 30 分钟，依此类推.
当天第一个事件的开始时间，即 00:00 时间应该始终是 00:00 时间戳前 1 小时 30 分钟，即前一天的 22:30.
当天最后一个事件的 EndTime，即 12:00 的时间应该总是在 12:00 时间戳(即同一天的 13:30)之后 1 小时 30 分钟.

对此问题的任何及时帮助将不胜感激.拼命尝试修复它，但还没有运气.

非常感谢！

解决方案

创建输出数据框:

out = pd.DataFrame({Event": df[value"] < 1000,开始时间":df[日期"] - pd.DateOffset(小时=1，分钟=30)，结束时间":df[日期"] + pd.DateOffset(小时=1，分钟=30)}，索引=df.index)

<预><代码>>>>出去事件开始时间结束时间0 真 2017-11-09 22:30:00 2017-11-10 01:30:00 # 第 0 组1 假 2017-11-10 01:30:00 2017-11-10 04:30:00 # 第 1 组2 假 2017-11-10 04:30:00 2017-11-10 07:30:003 假 2017-11-10 07:30:00 2017-11-10 10:30:004 假 2017-11-10 10:30:00 2017-11-10 13:30:005 真 2017-11-10 22:30:00 2017-11-11 01:30:00 # 第 2 组6 真 2017-11-11 01:30:00 2017-11-11 04:30:007 真 2017-11-11 04:30:00 2017-11-11 07:30:008 假 2017-11-11 07:30:00 2017-11-11 10:30:00 # 第 3 组9 真 2017-11-11 10:30:00 2017-11-11 13:30:00 # 第 4 组10 真 2017-11-11 22:30:00 2017-11-12 01:30:00 # 第 5 组11 真 2017-11-12 01:30:00 2017-11-12 04:30:0012 真 2017-11-12 04:30:00 2017-11-12 07:30:0013 假 2017-11-12 07:30:00 2017-11-12 10:30:00 # 第 6 组14 假 2017-11-12 10:30:00 2017-11-12 13:30:00

定义一些助手组:

event_group = out["Event"].ne(out["Event"].shift(fill_value=0)).cumsum()time_group = (out[StartTime"]- out[EndTime"].shift(fill_value=out[StartTime"].iloc[0])!= pd.Timedelta(0)).cumsum()

<预><代码>>>>out[[事件"]].assign(EventGroup=event_group,时间组=时间组，组=事件组+时间组)事件 EventGroup 时间组组0 真 1 0 1 # 组 01 错误 2 0 2 # 第 1 组2 错误 2 0 23 错误 2 0 24 错误 2 0 25 真 3 1 4 # 第 2 组6 真 3 1 47 真 3 1 48 错误 4 1 5 # 第 3 组9 真 5 1 6 # 第 4 组10 真 5 2 7 # 第 5 组11 真 5 2 712 真 5 2 713 假 6 2 8 # 第 6 组14 错误 6 2 8

减少输出数据帧:

out = pd.DataFrame(out.groupby(event_group + time_group).apply(lambda g: (g[事件"].iloc[0],g[StartTime"].iloc[0],g[EndTime"].iloc[-1])).tolist(), columns=[事件"，开始时间"，结束时间"])

<预><代码>>>>出去事件开始时间结束时间0 真 2017-11-09 22:30:00 2017-11-10 01:30:001 假 2017-11-10 01:30:00 2017-11-10 13:30:002 真 2017-11-10 22:30:00 2017-11-11 07:30:003 假 2017-11-11 07:30:00 2017-11-11 10:30:004 真 2017-11-11 10:30:00 2017-11-11 13:30:005 真 2017-11-11 22:30:00 2017-11-12 07:30:006 假 2017-11-12 07:30:00 2017-11-12 13:30:00

I have a meteorological timeseries df:

df = pd.DataFrame({'date':['11/10/2017 0:00','11/10/2017 03:00','11/10/2017 06:00','11/10/2017 09:00','11/10/2017 12:00',
                       '11/11/2017 0:00','11/11/2017 03:00','11/11/2017 06:00','11/11/2017 09:00','11/11/2017 12:00',
                      '11/12/2017 00:00','11/12/2017 03:00','11/12/2017 06:00','11/12/2017 09:00','11/12/2017 12:00'],
              'value':[850,np.nan,np.nan,np.nan,np.nan,500,650,780,np.nan,800,350,690,780,np.nan,np.nan]})
df['date'] = pd.to_datetime(df.date.astype(str), format='%m/%d/%Y %H:%M',errors ='coerce') 
df.index = pd.DatetimeIndex(df.date)

With this dataframe, I am trying to find out start time and end time of event:

(df["value"] < 1000)

I used solution similar to How to find the start time and end time of an event in python? with revised code:

current_event = None
result = []
for event, time in zip((df["value"] < 1000), df.index):
    if event != current_event:
        if current_event is not None:
            result.append([current_event, start_time, time - pd.DateOffset(hours = 1, minutes = 30)])
        current_event, start_time = event, time - pd.DateOffset(hours = 1, minutes = 30)
df = pd.DataFrame(result, columns=['Event','StartTime','EndTime'])
df

Output is:

   Event           StartTime             EndTime
0   True 2017-11-09 22:30:00 2017-11-10 01:30:00
1  False 2017-11-10 01:30:00 2017-11-10 22:30:00
2   True 2017-11-10 22:30:00 2017-11-11 07:30:00
3  False 2017-11-11 07:30:00 2017-11-11 10:30:00
4   True 2017-11-11 10:30:00 2017-11-12 07:30:00

But the desired ouput is:

Desired output differs from the output above:

EndTime in second row(Index 1) to be 2017-11-10 13:30:00
EndTime of fifth row (Index 4 ) to be 2017-11-11 13:30:00
New row sixth row(index 5) and 6th

Logic:

Since the timestamps are 3h apart ,an event is assumed to start 1 hr and 30 minutes before and end at 1 hr 30 minutes after the timestamp.
If two consecutive events are similar then they add up like: 1 hr and 30 minutes before the first timestamp till 1 hr and 30 minutes after second timestamp and so on.
StartTime of first event of the day i.e. at time 00:00 should always be 1 hr 30 minutes before 00:00 timestamp i.e. 22:30 of previous day.
EndTime of the last event of the day i.e. at time 12:00 should always be 1 hr 30 minutes after the 12:00 timestamp i.e. 13:30 of the same day.

Any prompt help on this issue would be highly appreciated. Tried to fix it desperately but no luck yet.

Thanks a lot!

解决方案

Create output dataframe:

out = pd.DataFrame({"Event": df["value"] < 1000,
                    "StartTime": df["date"] - pd.DateOffset(hours=1, minutes=30),
                    "EndTime": df["date"] + pd.DateOffset(hours=1, minutes=30)},
                   index=df.index)

>>> out
    Event           StartTime             EndTime
0    True 2017-11-09 22:30:00 2017-11-10 01:30:00  # Group 0
1   False 2017-11-10 01:30:00 2017-11-10 04:30:00  # Group 1
2   False 2017-11-10 04:30:00 2017-11-10 07:30:00
3   False 2017-11-10 07:30:00 2017-11-10 10:30:00
4   False 2017-11-10 10:30:00 2017-11-10 13:30:00
5    True 2017-11-10 22:30:00 2017-11-11 01:30:00  # Group 2
6    True 2017-11-11 01:30:00 2017-11-11 04:30:00
7    True 2017-11-11 04:30:00 2017-11-11 07:30:00
8   False 2017-11-11 07:30:00 2017-11-11 10:30:00  # Group 3
9    True 2017-11-11 10:30:00 2017-11-11 13:30:00  # Group 4
10   True 2017-11-11 22:30:00 2017-11-12 01:30:00  # Group 5
11   True 2017-11-12 01:30:00 2017-11-12 04:30:00
12   True 2017-11-12 04:30:00 2017-11-12 07:30:00
13  False 2017-11-12 07:30:00 2017-11-12 10:30:00  # Group 6
14  False 2017-11-12 10:30:00 2017-11-12 13:30:00

Define some helper groups:

event_group = out["Event"].ne(out["Event"].shift(fill_value=0)).cumsum()
time_group = (out["StartTime"] 
              - out["EndTime"].shift(fill_value=out["StartTime"].iloc[0])
              != pd.Timedelta(0)).cumsum()

>>> out[["Event"]].assign(EventGroup=event_group,
                          TimeGroup=time_group,
                          Groups=event_group + time_group)
    Event  EventGroup  TimeGroup  Groups
0    True           1          0       1  # Group 0
1   False           2          0       2  # Group 1
2   False           2          0       2
3   False           2          0       2
4   False           2          0       2
5    True           3          1       4  # Group 2
6    True           3          1       4
7    True           3          1       4
8   False           4          1       5  # Group 3
9    True           5          1       6  # Group 4
10   True           5          2       7  # Group 5
11   True           5          2       7
12   True           5          2       7
13  False           6          2       8  # Group 6
14  False           6          2       8

Reduce output dataframe:

out = pd.DataFrame(out.groupby(event_group + time_group)
                      .apply(lambda g: (g["Event"].iloc[0],
                                        g["StartTime"].iloc[0], 
                                        g["EndTime"].iloc[-1]))
                      .tolist(), columns=["Event", "StartTime", "EndTime"])

>>> out
   Event           StartTime             EndTime
0   True 2017-11-09 22:30:00 2017-11-10 01:30:00
1  False 2017-11-10 01:30:00 2017-11-10 13:30:00
2   True 2017-11-10 22:30:00 2017-11-11 07:30:00
3  False 2017-11-11 07:30:00 2017-11-11 10:30:00
4   True 2017-11-11 10:30:00 2017-11-11 13:30:00
5   True 2017-11-11 22:30:00 2017-11-12 07:30:00
6  False 2017-11-12 07:30:00 2017-11-12 13:30:00

这篇关于查找一天中事件的开始时间和结束时间 - Pandas 时间序列 - 这样结束时间就不会落在第二天的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查找一天中事件的开始时间和结束时间 - Pandas 时间序列 - 这样结束时间就不会落在第二天 [英] Finding start-time and end-time of events in a day - Pandas timeseries - such that end time does not fall into next day

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

查找一天中事件的开始时间和结束时间 - Pandas 时间序列 - 这样结束时间就不会落在第二天 [英] Finding start-time and end-time of events in a day - Pandas timeseries - such that end time does not fall into next day

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭