R滚动连接两个具有连接误差容限的数据表 [英] R rolling join two data.tables with error margin on join
问题描述
注意:此问题是此条件,但措辞不同,并建议使用 data.table
代替 dplyr
Note: this question is a copy of this one but with different wording, and a suggestion for data.table
instead of dplyr
我有两个数据集,分别包含多个测量时刻不同患者的得分,如下所示:
I have two datasets that contain scores for different patients on multiple measuring moments like so:
dt1 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient3"),
"Days" = c(0,10,25,340,100,538),
"Score" = c(NA,2,3,99,5,6),
stringsAsFactors = FALSE)
dt2 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient2","patient3"),
"Days" = c(0,10,25,353,100,150,503),
"Score" = c(1,10,3,4,5,7,6),
stringsAsFactors = FALSE)
> dt1
ID Days Score
1 patient1 0 NA
2 patient1 10 2
3 patient1 25 3
4 patient1 340 99
5 patient2 100 5
6 patient3 538 6
> dt2
ID Days Score
1 patient1 0 1
2 patient1 10 10
3 patient1 25 3
4 patient1 353 4
5 patient2 100 5
6 patient2 150 7
7 patient3 503 6
列天数
是时间度量。我想基于 ID
和 Days
加入两个数据集,前提是 Days
的阈值在阈值<-30
之内。有五个条件:
Column Days
is the time measurement. I want to join both datasets based on ID
and Days
if the value for Days
is within threshold <- 30
. There are five conditions:
- 同一df(行1和2)中阈值之内的连续天数不合并。 / li>
- 在某些情况下,同一数据帧中最多存在四个Days变量值,因此不应合并。这些值之一可能确实存在于另一个数据帧的阈值中,并且这些值必须合并(第4行)。
- 不在阈值之内的数据不应合并,也不应丢弃(请参见示例输出第7和8行)。
- 如果
Days $ c $没有对应的值c>在任何一个数据集中,都应填写NA。
- 数据帧的长度 不是 ! li>
- Consecutive days that are within the threshold from within the same df (rows 1 and 2) are not merged.
- In some cases, up to four values for the Days variable exist in the same dataframe and thus should not be merged. It might be the case that one of these values does exist within the treshold in the other dataframe, and these will have to be merged (row 4).
- Data that does not fall within treshold should not be merged, but not be discarded either (see example output row 7 and 8).
- If there is no corresponding value for
Days
in either of the data sets, NA should be filled in. - The dataframes are not of equal length!
我怀疑 data.table滚动连接
可以给我答案,但是我可以似乎没有弄清楚。预期的输出如下:
I suspect a data.table rolling join
can give me the answer but I can't seem to figure it out. The expected output is as follows:
setDT(dt1)
setDT(dt2)
setkey(dt1, ID, Days) ?
setkey(dt2, ID, Days) ?
** do the join **
> dt_joined
ID Days Score.x Score.y
1 patient1 0 NA 1
2 patient1 10 2 10
3 patient1 25 3 3
4 patient1 353 99 4 <<- merged (days 340 > 353)
5 patient2 100 5 5
6 patient2 150 NA 7 <<- new row added in dt2
7 patient3 503 NA 6
8 patient3 538 6 NA <<- same score as row 7 but not within treshold
$ b之内$ b
任何帮助将不胜感激。 data.table
解决方案不是强制性的。
Any help would be greatly appreciated. A data.table
solution is not mandatory.
推荐答案
A <$ Uwe用户已经在此处给出了c $ c> data.table 答案:
https://stackoverflow.com/a/62321710/12079387
这篇关于R滚动连接两个具有连接误差容限的数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!