如何检查病情是否持续超过15分钟? [英] How to check if a condition lasted for more than 15 mins?
问题描述
下面是数据集的示例
日期 | 值 |
---|---|
2020-01-01 01:35 | 50 |
2020-01-01 01:41 | 49 |
2020-01-01 01:46 | 50 |
我希望检查连续15分钟的值"是否等于50.如果是,我想提取它发生的日期.让我举一个例子,我说连续15分钟.假设我要在5分钟(而不是15分钟)的连续时间内检查该值是否等于50.满足该条件的数据如下
I wish to check if the 'Value' was equal to 50 for continuous period of 15 mins. If yes, I want to extract the date for which it occurred. Let me give an example what I mean by continuous period of 15 mins. Assume that I want to check if the value is equal to 50 for a continuous period of 5 mins (instead of 15 mins). The data that would satisfy this condition would be as follows
日期 | 值 |
---|---|
2020-01-01 01:35 | 50 |
2020-01-01 01:36 | 50 |
2020-01-01 01:37 | 50 |
2020-01-01 01:38 | 50 |
2020-01-01 01:39 | 50 |
然后我想将日期 2020-01-01
提取到列表中,因为上述数据连续5分钟(或更长)等于50.
Then I want to extract the date2020-01-01
onto a list because the above data was equal to 50 for a continuous period of 5 mins (or more).
推荐答案
我将代码发布5分钟,以便输出与您所需的输出匹配.将 300
更改为 900
15分钟.步骤:
I am posting code for 5 mins so that output matches your desired output. Change 300
to 900
for 15 mins.
Steps:
-
将
df ['Date']
转换为datetime
,以便我们可以减去两个日期知道他们之间的时差.
Convert the
df['Date']
todatetime
so that we can subtract two dates to know the time difference between them.
按日期对 df
进行分组,并为每个分组对象调用 f
.
Group the df
by date and Call f
for each group object.
在 f
中: max-continuous_range
给出了长度为50的最长段的长度.如果长度为5分钟或以上,则 f
返回True.如果 f
返回 True
,则在列表中追加日期.
In f
: max-continuous_range
gives the
length of longest segment where value is 50. f
return True if length is 5 mins or more. Append date in list if f
returns True
.
使用:
def f(g):
mask = (g['Value'] == 50)
max_continuous_range = (np.max(np.cumsum(g['Date'].where(mask).diff()))
+ timedelta(minutes = 1))
return max_continuous_range.seconds >= 300
df['Date'] = pd.to_datetime(df['Date'])
groups = df.groupby(df['Date'].dt.date, as_index = False)
final_list = [str(idx) for idx, g in groups if f(g)]
输入:
Date Value
0 2020-01-01 01:35 40
1 2020-01-01 01:36 50
2 2020-01-01 01:37 50
3 2020-01-01 01:38 50
4 2020-01-01 01:39 50
5 2020-01-01 01:40 50
6 2020-01-01 01:41 40
7 2020-01-01 01:42 40
输出:
>>> final_list
['2020-01-01']
在f(g)内:
掩码
:真,值是50.
0 False
1 True
2 True
3 True
4 True
5 True
6 False
7 False
df ['Date'].where(mask)
将NaT放在mask不是True的地方.
df['Date'].where(mask)
Puts NaT where mask is not True.
0 NaT
1 2020-01-01 01:36:00
2 2020-01-01 01:37:00
3 2020-01-01 01:38:00
4 2020-01-01 01:39:00
5 2020-01-01 01:40:00
6 NaT
7 NaT
.diff
给出两个连续元素之间的区别.如果任何值为NaT,它将给出NaT. df ['Date'].where(mask).diff()
:
.diff
gives difference between two consecuting elements. It will give NaT if any value is NaT. Result after df['Date'].where(mask).diff()
:
0 NaT
1 NaT
2 0 days 00:01:00
3 0 days 00:01:00
4 0 days 00:01:00
5 0 days 00:01:00
6 NaT
7 NaT
现在,连续时间之间的累计差值总和将为我们提供经过的总时间.在 np.cumsum(...)
之后:
Now cumulative sum of difference between consecutive times will give us the total time elapsed. After np.cumsum(...)
:
0 NaT
1 NaT
2 0 days 00:01:00
3 0 days 00:02:00
4 0 days 00:03:00
5 0 days 00:04:00
6 NaT
7 NaT
np.max
给了我们最长的长度.添加 1
分钟以处理边界条件
np.max
gives us the longest length. 1
minute is added to take care of boundary condition
这篇关于如何检查病情是否持续超过15分钟?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!