计算有条件的两行之间的时间差 [英] Calculate the time difference between two rows with conditions

查看:58
本文介绍了计算有条件的两行之间的时间差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例数据帧(df),如下所示:

 Date_Time Open High Low Close UOD VWB20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 向上 321 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 向上 322 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 向下 324 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 向下 025 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 向下 326 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 向上 327 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 向上 328 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 向上 329 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 向下 030 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 向下 331 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 向下 1

我有以下条件:

<块引用>

  1. 检查 df['VWB'] == 0 &df['UOD'] == "DOWN";&获取相应的 Open 值(在我的示例中为 10290.00)
  2. 然后在该行之后查找第一次出现的 关闭 值大于此 打开 值 (10290.00).
  3. 找出条件 1 (df['VWB'] == 0 & df['UOD'] == "DOWN") 和条件 2(第一次出现)的两个 rors 之间的时间差到另一列 (TD).

我想要我想要的输出,如下所示,带有有效列

 Date_Time Open High Low Close UOD VWB Valid TD20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 向上 3 021 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 向上 3 022 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 向下 3 023 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 下降 0 025 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 向下 3 026 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3 1 600 <<=第一次出现27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 向上 3 028 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 向上 3 029 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 向下 0 030 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 下降 3 031 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1 1 600 <<=第一次出现

解决方案

这里有一个方法,不确定这是否是最好的方法并且可能会被优化(内嵌评论)

#获取每个条件的开放值open_val = df.loc[(df['VWB'] == 0) &(df['UOD'] == "DOWN"),'打开']#检查打开值的位置>df['Close'] 并创建组c = df['关闭'].gt(open_val.reindex(df.index,method='ffill'))a = np.digitize(df.index,open_val.index)#获取每组中的第一个索引并设置有效列valid_idx = c.groupby(a).idxmax()df['Valid'] = c.loc[valid_idx].reindex(df.index,fill_value=False).astype(int)#计算条件匹配的时间差和掩码TD = (df['Date_Time'] -df.loc[open_val.index,'Date_Time'].reindex(df.index,method='ffill')).dt.total_seconds()df['TD'] = TD.where(df['Valid'].eq(1))


print(df[['Date_Time','Open','Close','UOD','VWB','Valid','TD']])Date_Time 打开关闭 UOD VWB 有效 TD20 2020-07-01 10:30:00 10298.85 10299.90 UP 3 0 NaN21 2020-07-01 10:35:00 10301.40 10305.75 UP 3 0 NaN22 2020-07-01 10:40:00 10305.75 10290.00 向下 3 0 NaN24 2020-07-01 10:45:00 10290.00 10282.65 向下 0 0 NaN25 2020-07-01 10:50:00 10282.30 10282.00 向下 3 0 NaN26 2020-07-01 10:55:00 10280.10 10291.50 向上 3 1 600.027 2020-07-01 11:00:00 10290.00 10297.55 UP 3 0 NaN28 2020-07-01 11:05:00 10296.70 10299.40 UP 3 0 NaN29 2020-07-01 11:10:00 10299.95 10292.00 向下 0 0 NaN30 2020-07-01 11:15:00 10293.05 10291.55 向下 3 0 NaN31 2020-07-01 11:20:00 10292.00 10351.45 向下 1 1 600.0

I have a sample dataframe(df) like below:

              Date_Time      Open      High       Low     Close   UOD  VWB
20  2020-07-01 10:30:00  10298.85  10299.90  10287.85  10299.90    UP    3
21  2020-07-01 10:35:00  10301.40  10310.00  10299.15  10305.75    UP    3
22  2020-07-01 10:40:00  10305.75  10305.75  10285.50  10290.00  DOWN    3
24  2020-07-01 10:45:00  10290.00  10291.20  10277.65  10282.65  DOWN    0
25  2020-07-01 10:50:00  10282.30  10289.80  10278.00  10282.00  DOWN    3
26  2020-07-01 10:55:00  10280.10  10295.00  10279.80  10291.50    UP    3
27  2020-07-01 11:00:00  10290.00  10299.95  10287.30  10297.55    UP    3
28  2020-07-01 11:05:00  10296.70  10306.30  10294.50  10299.40    UP    3
29  2020-07-01 11:10:00  10299.95  10301.10  10291.50  10292.00  DOWN    0
30  2020-07-01 11:15:00  10293.05  10298.70  10286.00  10291.55  DOWN    3
31  2020-07-01 11:20:00  10292.00  10298.70  10286.00  10351.45  DOWN    1

I have below conditions:

  1. Check for df['VWB'] == 0 & df['UOD'] == "DOWN" & get the corresponding Open value (= 10290.00 in my example)
  2. Then Find the first occurrence of Close value greater than this Open value (10290.00) after that row.
  3. Find the time Difference between two rors with Condition 1 (df['VWB'] == 0 & df['UOD'] == "DOWN") and Condition 2 (first occurrence) in to another column (TD).

I want my desired outout as below with Valid Column

              Date_Time      Open      High       Low     Close   UOD  VWB  Valid    TD
20  2020-07-01 10:30:00  10298.85  10299.90  10287.85  10299.90    UP    3      0
21  2020-07-01 10:35:00  10301.40  10310.00  10299.15  10305.75    UP    3      0
22  2020-07-01 10:40:00  10305.75  10305.75  10285.50  10290.00  DOWN    3      0
23  2020-07-01 10:45:00  10290.00  10291.20  10277.65  10282.65  DOWN    0      0
25  2020-07-01 10:50:00  10282.30  10289.80  10278.00  10282.00  DOWN    3      0
26  2020-07-01 10:55:00  10280.10  10295.00  10279.80  10291.50    UP    3      1    600 <<= first occurrence
27  2020-07-01 11:00:00  10290.00  10299.95  10287.30  10297.55    UP    3      0
28  2020-07-01 11:05:00  10296.70  10306.30  10294.50  10299.40    UP    3      0
29  2020-07-01 11:10:00  10299.95  10301.10  10291.50  10292.00  DOWN    0      0
30  2020-07-01 11:15:00  10293.05  10298.70  10286.00  10291.55  DOWN    3      0
31  2020-07-01 11:20:00  10292.00  10298.70  10286.00  10351.45  DOWN    1      1    600 <<= first occurrence

解决方案

Here is an approach, not sure if this is the best way and might be possible to be optimized(comments inline)

#gets open value per the condition
open_val = df.loc[(df['VWB'] == 0) & (df['UOD'] == "DOWN"),'Open']

#check where open value > df['Close'] and create groups
c = df['Close'].gt(open_val.reindex(df.index,method='ffill'))
a = np.digitize(df.index,open_val.index)

#get first index in each group and set the Valid column
valid_idx = c.groupby(a).idxmax()
df['Valid'] = c.loc[valid_idx].reindex(df.index,fill_value=False).astype(int)

#calculate time difference and mask where consition matches
TD = (df['Date_Time'] - 
df.loc[open_val.index,'Date_Time'].reindex(df.index,method='ffill')).dt.total_seconds()
df['TD'] = TD.where(df['Valid'].eq(1))


print(df[['Date_Time','Open','Close','UOD','VWB','Valid','TD']])

             Date_Time      Open     Close   UOD  VWB  Valid     TD
20 2020-07-01 10:30:00  10298.85  10299.90    UP    3      0    NaN
21 2020-07-01 10:35:00  10301.40  10305.75    UP    3      0    NaN
22 2020-07-01 10:40:00  10305.75  10290.00  DOWN    3      0    NaN
24 2020-07-01 10:45:00  10290.00  10282.65  DOWN    0      0    NaN
25 2020-07-01 10:50:00  10282.30  10282.00  DOWN    3      0    NaN
26 2020-07-01 10:55:00  10280.10  10291.50    UP    3      1  600.0
27 2020-07-01 11:00:00  10290.00  10297.55    UP    3      0    NaN
28 2020-07-01 11:05:00  10296.70  10299.40    UP    3      0    NaN
29 2020-07-01 11:10:00  10299.95  10292.00  DOWN    0      0    NaN
30 2020-07-01 11:15:00  10293.05  10291.55  DOWN    3      0    NaN
31 2020-07-01 11:20:00  10292.00  10351.45  DOWN    1      1  600.0

这篇关于计算有条件的两行之间的时间差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆