通过与data.table匹配来合并时间约束 [英] Incorporate a time constraint by matching with data.table
问题描述
在这个问题中,提供了两个data.frames,因为这个问题集中在一个更具体的部分,示例数据被减少。
tc< - textConnection('
ID Track4时间Loc
4 50 40 1
5 55 50 1
6 55 60 1
')
MATCHINGS< - read.table(tc,header = TRUE)
tc< - textConnection('
ID Track4时间Loc
50 40 1
55 10 1
55 40 1
55 59 1')
INVOLVED< - read.table(tc,header = TRUE)
在上一个问题中找到了一个解决方案找到这个问题:目标是从 MATCHINGS中放置最近的
ID
/ code>和匹配
到
到
INVOLVED
$ c>。额外的条件是匹配的 INVOLVED
条目的 Time
可能不高于
。这是通过当前方法实现的(见下文) MATCHING
中的条目的时间
一个新约束是: Time
code> INVOLVED 条目可能不会超过30秒(时间
列以秒为单位) MATCHINGS 条目。现在实现以下输出:
ID Track4时间Loc
4 50 40 1
5 55 10 1
5 55 40 1
6 55 59 1
是:
ID Track4时间Loc
4 50 40 1
55 10 1
5 55 40 1
6 55 59 1
c $ c> CONOLVED 条目比与 Track4 $ c匹配的
MATCHINGS
条目低30秒$ c>和 Loc
。我不明白如何将这个在我目前的解决方案。根据马修·道尔在de data.table包中的功能请求与这个问题相关,但它应该已经可以incoporate。
当前方法(不考虑时间约束)
M = as.data.table(MATCHINGS)
I = as.data.table(INVOLVED)
M [,时间:= - 时间]
I [时间:= - 时间]
setkey(M,Loc,Track4,Time)
I [,ID: TRUE,mult =first]}],[时间:= - 时间]
更新: roll
参数接受有限的回滚和前滚。只需更新此信息,以便我们可以关闭#615 。
#dt1 = MATCHES,dt2 = INVOLVED
#确保dt2没有'ID'列,是整数类型
require(data.table)#v1.9.6 +
dt2 [dt1,ID:= i.ID,on = c(Track4,Time),roll = 30 ]
#Track4 Time Loc ID
#1:50 40 1 4
#2:55 10 1 NA
#3:55 40 1 5
#4: 55 59 1 6
同样使用 on =
在 v1.9.6
b $ b
This question is a follow-up to a previous question: Click
In this question two data.frames are provided, since this question focuses on a more specific part, the example data is reduced.
tc <- textConnection('
ID Track4 Time Loc
4 50 40 1
5 55 50 1
6 55 60 1
')
MATCHINGS <- read.table(tc, header=TRUE)
tc <- textConnection('
ID Track4 Time Loc
"" 50 40 1
"" 55 10 1
"" 55 40 1
"" 55 59 1 ')
INVOLVED <- read.table(tc, header=TRUE)
In the previous question a solution was found to this problem: The goal is to place the least recent ID
from MATCHINGS
into INVOLVED
by matching on Track1
and Loc
. An extra condition is that the Time
of the matching INVOLVED
entry may not be higher than the Time
of the entry in MATCHING
. This was achieved with the current approach (see below)
A new constraint is that: the Time
of the INVOLVED
entry may not be more than 30 seconds (Time
column is in seconds) lower than the MATCHINGS
entry. Right now the following output is achieved:
ID Track4 Time Loc
4 50 40 1
5 55 10 1
5 55 40 1
6 55 59 1
The expected results are however:
ID Track4 Time Loc
4 50 40 1
"" 55 10 1
5 55 40 1
6 55 59 1
Since the Time of the INVOLVED
entry is more than 30 seconds lower than the MATCHINGS
entry that matches on Track4
and Loc
. I do not see how to incorporate this in my current solution. According to Matthew Dowle a feature request in de data.table package is related to this issue, but it should already be possible to incoporate. Does anyone know how?
The current approach (Without taking the time constraint into account)
M = as.data.table(MATCHINGS)
I = as.data.table(INVOLVED)
M[,Time:=-Time]
I[,Time:=-Time]
setkey(M,Loc,Track4,Time)
I[,ID:={i=list(Loc,Track4,Time);M[i,ID,roll=TRUE,mult="first"]}][,Time:=-Time]
Update: roll
argument in data.table accepts finite roll backs and roll forwards since A LONG TIME. Just updating this post so that we can close #615.
# dt1 = MATCHES, dt2 = INVOLVED
# make sure dt2 doesn't have `ID` column, or if it has, it is of integer type
require(data.table) # v1.9.6+
dt2[dt1, ID := i.ID, on=c("Track4", "Time"), roll=30]
# Track4 Time Loc ID
# 1: 50 40 1 4
# 2: 55 10 1 NA
# 3: 55 40 1 5
# 4: 55 59 1 6
Also using the on=
argument implemented in v1.9.6
.
See history for the older answer if necessary.
这篇关于通过与data.table匹配来合并时间约束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!