查找每一行最接近出现特定值的时间 [英] Find time to nearest occurrence of particular value for each row
问题描述
说我有一个数据表:
dt <- data.table(
datetime = seq(as.POSIXct("2016-01-01 00:00:00"),as.POSIXct("2016-01-01 10:00:00"), by = "1 hour"),
ObType = c("A","A","B","B","B","B","A","A","B","A","A")
)
dt
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 02:00:00 B
4: 2016-01-01 03:00:00 B
5: 2016-01-01 04:00:00 B
6: 2016-01-01 05:00:00 B
7: 2016-01-01 06:00:00 A
8: 2016-01-01 07:00:00 A
9: 2016-01-01 08:00:00 B
10: 2016-01-01 09:00:00 A
11: 2016-01-01 10:00:00 A
我需要做的就是ObType是 B的地方,我需要找到到最接近的ObType A的时间。因此结果应该看起来像(以小时为单位):
What I need to do is wherever the ObType is "B", I need to find the time to the nearest ObType "A" on either side. So the result should look like (in hours):
datetime ObType timeLag timeLead
1: 2016-01-01 00:00:00 A NA NA
2: 2016-01-01 01:00:00 A NA NA
3: 2016-01-01 02:00:00 B 1 4
4: 2016-01-01 03:00:00 B 2 3
5: 2016-01-01 04:00:00 B 3 2
6: 2016-01-01 05:00:00 B 4 1
7: 2016-01-01 06:00:00 A NA NA
8: 2016-01-01 07:00:00 A NA NA
9: 2016-01-01 08:00:00 B 1 1
10: 2016-01-01 09:00:00 A NA NA
11: 2016-01-01 10:00:00 A NA NA
我通常使用data.table,但非data.table解决方案也可以。
I usually use data.table, but non data.table solutions are also fine.
谢谢!
Lyss
推荐答案
我暗示使用 roll =
的方法:
X = dt[ObType=="A"]
X
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 06:00:00 A
4: 2016-01-01 07:00:00 A
5: 2016-01-01 09:00:00 A
6: 2016-01-01 10:00:00 A
dt[ObType=="B", Lag:=X[.SD,on="datetime",roll=Inf,i.datetime-x.datetime]]
dt[ObType=="B", Lead:=X[.SD,on="datetime",roll=-Inf,x.datetime-i.datetime]]
dt[ObType=="B", Nearest:=X[.SD,on="datetime",roll="nearest",x.datetime-i.datetime]]
dt
datetime ObType Lag Lead Nearest
1: 2016-01-01 00:00:00 A NA hours NA hours NA hours
2: 2016-01-01 01:00:00 A NA hours NA hours NA hours
3: 2016-01-01 02:00:00 B 1 hours 4 hours -1 hours
4: 2016-01-01 03:00:00 B 2 hours 3 hours -2 hours
5: 2016-01-01 04:00:00 B 3 hours 2 hours 2 hours
6: 2016-01-01 05:00:00 B 4 hours 1 hours 1 hours
7: 2016-01-01 06:00:00 A NA hours NA hours NA hours
8: 2016-01-01 07:00:00 A NA hours NA hours NA hours
9: 2016-01-01 08:00:00 B 1 hours 1 hours -1 hours
10: 2016-01-01 09:00:00 A NA hours NA hours NA hours
11: 2016-01-01 10:00:00 A NA hours NA hours NA hours
roll =
的一个优势是您可以通过将 Inf
更改为希望加入的时间限制来应用陈旧限制。限制适用的时差,而不是行数。 Inf
仅表示没有限制。 roll =
符号指示是向前还是向后(超前或滞后)。
One advantage of roll=
is that you can apply a staleness limit just by changing the Inf
to the limit of time you wish to join within. It's the time difference that the limit applies to, not the number of rows. Inf
just means don't limit. The roll=
sign indicates whether to look forwards or backwards (lead or lag).
另一个优点是 roll =
很快。
这篇关于查找每一行最接近出现特定值的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!