计算有条件的两行之间的时间差 [英] Calculate the time difference between two rows with conditions
本文介绍了计算有条件的两行之间的时间差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个示例数据帧(df),如下所示:
Date_Time Open High Low Close UOD VWB20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 向上 321 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 向上 322 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 向下 324 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 向下 025 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 向下 326 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 向上 327 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 向上 328 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 向上 329 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 向下 030 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 向下 331 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 向下 1
我有以下条件:
<块引用>- 检查 df['VWB'] == 0 &df['UOD'] == "DOWN";&获取相应的 Open 值(在我的示例中为 10290.00)
- 然后在该行之后查找第一次出现的 关闭 值大于此 打开 值 (10290.00).
- 找出条件 1 (df['VWB'] == 0 & df['UOD'] == "DOWN") 和条件 2(第一次出现)的两个 rors 之间的时间差到另一列 (TD).
我想要我想要的输出,如下所示,带有有效列
Date_Time Open High Low Close UOD VWB Valid TD20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 向上 3 021 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 向上 3 022 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 向下 3 023 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 下降 0 025 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 向下 3 026 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3 1 600 <<=第一次出现27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 向上 3 028 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 向上 3 029 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 向下 0 030 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 下降 3 031 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1 1 600 <<=第一次出现
解决方案
这里有一个方法,不确定这是否是最好的方法并且可能会被优化(内嵌评论)
#获取每个条件的开放值open_val = df.loc[(df['VWB'] == 0) &(df['UOD'] == "DOWN"),'打开']#检查打开值的位置>df['Close'] 并创建组c = df['关闭'].gt(open_val.reindex(df.index,method='ffill'))a = np.digitize(df.index,open_val.index)#获取每组中的第一个索引并设置有效列valid_idx = c.groupby(a).idxmax()df['Valid'] = c.loc[valid_idx].reindex(df.index,fill_value=False).astype(int)#计算条件匹配的时间差和掩码TD = (df['Date_Time'] -df.loc[open_val.index,'Date_Time'].reindex(df.index,method='ffill')).dt.total_seconds()df['TD'] = TD.where(df['Valid'].eq(1))
print(df[['Date_Time','Open','Close','UOD','VWB','Valid','TD']])Date_Time 打开关闭 UOD VWB 有效 TD20 2020-07-01 10:30:00 10298.85 10299.90 UP 3 0 NaN21 2020-07-01 10:35:00 10301.40 10305.75 UP 3 0 NaN22 2020-07-01 10:40:00 10305.75 10290.00 向下 3 0 NaN24 2020-07-01 10:45:00 10290.00 10282.65 向下 0 0 NaN25 2020-07-01 10:50:00 10282.30 10282.00 向下 3 0 NaN26 2020-07-01 10:55:00 10280.10 10291.50 向上 3 1 600.027 2020-07-01 11:00:00 10290.00 10297.55 UP 3 0 NaN28 2020-07-01 11:05:00 10296.70 10299.40 UP 3 0 NaN29 2020-07-01 11:10:00 10299.95 10292.00 向下 0 0 NaN30 2020-07-01 11:15:00 10293.05 10291.55 向下 3 0 NaN31 2020-07-01 11:20:00 10292.00 10351.45 向下 1 1 600.0
I have a sample dataframe(df) like below:
Date_Time Open High Low Close UOD VWB
20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 UP 3
21 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 UP 3
22 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 DOWN 3
24 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 DOWN 0
25 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 DOWN 3
26 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3
27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 UP 3
28 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 UP 3
29 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 DOWN 0
30 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 DOWN 3
31 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1
I have below conditions:
- Check for df['VWB'] == 0 & df['UOD'] == "DOWN" & get the corresponding Open value (= 10290.00 in my example)
- Then Find the first occurrence of Close value greater than this Open value (10290.00) after that row.
- Find the time Difference between two rors with Condition 1 (df['VWB'] == 0 & df['UOD'] == "DOWN") and Condition 2 (first occurrence) in to another column (TD).
I want my desired outout as below with Valid Column
Date_Time Open High Low Close UOD VWB Valid TD
20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 UP 3 0
21 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 UP 3 0
22 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 DOWN 3 0
23 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 DOWN 0 0
25 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 DOWN 3 0
26 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3 1 600 <<= first occurrence
27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 UP 3 0
28 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 UP 3 0
29 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 DOWN 0 0
30 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 DOWN 3 0
31 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1 1 600 <<= first occurrence
解决方案
Here is an approach, not sure if this is the best way and might be possible to be optimized(comments inline)
#gets open value per the condition
open_val = df.loc[(df['VWB'] == 0) & (df['UOD'] == "DOWN"),'Open']
#check where open value > df['Close'] and create groups
c = df['Close'].gt(open_val.reindex(df.index,method='ffill'))
a = np.digitize(df.index,open_val.index)
#get first index in each group and set the Valid column
valid_idx = c.groupby(a).idxmax()
df['Valid'] = c.loc[valid_idx].reindex(df.index,fill_value=False).astype(int)
#calculate time difference and mask where consition matches
TD = (df['Date_Time'] -
df.loc[open_val.index,'Date_Time'].reindex(df.index,method='ffill')).dt.total_seconds()
df['TD'] = TD.where(df['Valid'].eq(1))
print(df[['Date_Time','Open','Close','UOD','VWB','Valid','TD']])
Date_Time Open Close UOD VWB Valid TD
20 2020-07-01 10:30:00 10298.85 10299.90 UP 3 0 NaN
21 2020-07-01 10:35:00 10301.40 10305.75 UP 3 0 NaN
22 2020-07-01 10:40:00 10305.75 10290.00 DOWN 3 0 NaN
24 2020-07-01 10:45:00 10290.00 10282.65 DOWN 0 0 NaN
25 2020-07-01 10:50:00 10282.30 10282.00 DOWN 3 0 NaN
26 2020-07-01 10:55:00 10280.10 10291.50 UP 3 1 600.0
27 2020-07-01 11:00:00 10290.00 10297.55 UP 3 0 NaN
28 2020-07-01 11:05:00 10296.70 10299.40 UP 3 0 NaN
29 2020-07-01 11:10:00 10299.95 10292.00 DOWN 0 0 NaN
30 2020-07-01 11:15:00 10293.05 10291.55 DOWN 3 0 NaN
31 2020-07-01 11:20:00 10292.00 10351.45 DOWN 1 1 600.0
这篇关于计算有条件的两行之间的时间差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文