通过与data.table匹配来合并时间约束 [英] Incorporate a time constraint by matching with data.table

查看:100
本文介绍了通过与data.table匹配来合并时间约束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是对上一个问题的跟进:点击



在这个问题中,提供了两个data.frames,因为这个问题集中在一个更具体的部分,示例数据被减少。

  tc<  -  textConnection('
ID Track4时间Loc
4 50 40 1
5 55 50 1
6 55 60 1

')

MATCHINGS< - read.table(tc,header = TRUE)

tc< - textConnection('
ID Track4时间Loc
50 40 1
55 10 1
55 40 1
55 59 1')

INVOLVED< - read.table(tc,header = TRUE)

在上一个问题中找到了一个解决方案找到这个问题:目标是从 MATCHINGS中放置最近的 ID / code>和匹配 INVOLVED $ c>。额外的条件是匹配的 INVOLVED 条目的 Time 可能不高于 MATCHING 中的条目的时间。这是通过当前方法实现的(见下文)



一个新约束是: Time code> INVOLVED 条目可能不会超过30秒(时间列以秒为单位) MATCHINGS 条目。现在实现以下输出:

  ID Track4时间Loc 
4 50 40 1
5 55 10 1
5 55 40 1
6 55 59 1

是:

  ID Track4时间Loc 
4 50 40 1
55 10 1
5 55 40 1
6 55 59 1

c $ c> CONOLVED 条目比与 Track4 MATCHINGS 条目低30秒$ c>和 Loc 。我不明白如何将这个在我目前的解决方案。根据马修·道尔在de data.table包中的功能请求与这个问题相关,但它应该已经可以incoporate。



当前方法(不考虑时间约束)

  M = as.data.table(MATCHINGS)
I = as.data.table(INVOLVED)
M [,时间:= - 时间]
I [时间:= - 时间]
setkey(M,Loc,Track4,Time)
I [,ID: TRUE,mult =first]}],[时间:= - 时间]


解决方案 data.table中的

更新 roll 参数接受有限的回滚前滚。只需更新此信息,以便我们可以关闭#615

 #dt1 = MATCHES,dt2 = INVOLVED 
#确保dt2没有'ID'列,是整数类型
require(data.table)#v1.9.6 +
dt2 [dt1,ID:= i.ID,on = c(Track4,Time),roll = 30 ]
#Track4 Time Loc ID
#1:50 40 1 4
#2:55 10 1 NA
#3:55 40 1 5
#4: 55 59 1 6

同样使用 on =

v1.9.6 b $ b

This question is a follow-up to a previous question: Click

In this question two data.frames are provided, since this question focuses on a more specific part, the example data is reduced.

tc <- textConnection('
ID  Track4  Time    Loc
4   50      40      1   
5   55      50      1   
6   55      60      1   

')

MATCHINGS <- read.table(tc, header=TRUE)

tc <- textConnection('
ID  Track4  Time    Loc
""  50      40      1   
""  55      10      1
""  55      40      1   
""  55      59      1     ')  

INVOLVED <- read.table(tc, header=TRUE)

In the previous question a solution was found to this problem: The goal is to place the least recent ID from MATCHINGS into INVOLVED by matching on Track1 and Loc. An extra condition is that the Time of the matching INVOLVED entry may not be higher than the Time of the entry in MATCHING. This was achieved with the current approach (see below)

A new constraint is that: the Time of the INVOLVED entry may not be more than 30 seconds (Time column is in seconds) lower than the MATCHINGS entry. Right now the following output is achieved:

ID Track4 Time Loc
4     50   40   1
5     55   10   1
5     55   40   1
6     55   59   1

The expected results are however:

ID Track4 Time Loc
4     50   40   1
""    55   10   1
5     55   40   1
6     55   59   1

Since the Time of the INVOLVED entry is more than 30 seconds lower than the MATCHINGS entry that matches on Track4 and Loc. I do not see how to incorporate this in my current solution. According to Matthew Dowle a feature request in de data.table package is related to this issue, but it should already be possible to incoporate. Does anyone know how?

The current approach (Without taking the time constraint into account)

M = as.data.table(MATCHINGS)
I = as.data.table(INVOLVED)
M[,Time:=-Time]
I[,Time:=-Time]
setkey(M,Loc,Track4,Time)
I[,ID:={i=list(Loc,Track4,Time);M[i,ID,roll=TRUE,mult="first"]}][,Time:=-Time]  

解决方案

Update: roll argument in data.table accepts finite roll backs and roll forwards since A LONG TIME. Just updating this post so that we can close #615.

# dt1 = MATCHES, dt2 = INVOLVED
# make sure dt2 doesn't have `ID` column, or if it has, it is of integer type
require(data.table) # v1.9.6+
dt2[dt1, ID := i.ID, on=c("Track4", "Time"), roll=30]
#    Track4 Time Loc ID
# 1:     50   40   1  4
# 2:     55   10   1 NA
# 3:     55   40   1  5
# 4:     55   59   1  6

Also using the on= argument implemented in v1.9.6.

See history for the older answer if necessary.

这篇关于通过与data.table匹配来合并时间约束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆