data.table 子集的条件 data.table 匹配 [英] conditional data.table match for subset of data.table

查看:46
本文介绍了data.table 子集的条件 data.table 匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

This post is related to the previous post here: match rows of two data.tables to fill subset of a data.table

Not sure how I can integrate them together. I have a situation where other than the NA for one column of DT1, a couple of more conditions should apply for merging, but that doesn't work.

> DT1 <- data.table(colA = c(1,1, 2,2,2,3,3), colB = c('A', NA, 'AA', 'B', NA, 'A', 'C'), timeA = c(2,4,3,4,6,1,4))
> DT1
   colA colB timeA
1:    1    A     2
2:    1 <NA>     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4
> DT2 <- data.table(colC = c(1,1,1,2,2,3), timeB1 = c(1,3,6, 2,4, 1), timeB2 = c(2,5,7,3,5,4), colD = c('Z', 'YY', 'AB', 'JJ', 'F', 'RR'))
> DT2
   colC timeB1 timeB2 colD
1:    1      1      2    Z
2:    1      3      5   YY
3:    1      6      7   AB
4:    2      2      3   JJ
5:    2      4      5    F
6:    3      1      4   RR

Using the same guideline as mentioned above, I'd like to merge ColD of DT2 to colB of DT1 only for NA values of colB in DT1 AND use the values of colD for which timeA in DT1 is between timeB1 and timeB2 in DT2. I tried the following but merge doesn't happen:

 > output <- DT1[DT2, on = .(colA = colC), colB := ifelse(is.na(x.colB) & i.timeB1 <= x.timeA & x.timeA <= i.timeB2, i.colD, x.colB)]
> output
> output
   colA colB timeA
1:    1    A     2
2:    1 <NA>     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4

Nothing changes in output. these is my desired output:

> desired_output
   colA colB timeA
1:    1    A     2
2:    1   YY     4   --> should find a match
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6   --> shouldn't find a match
6:    3    A     1
7:    3    C     4

why doesn't this work? I'd like to use data.table operations only without using additional packages.

解决方案

An in place update of the colB in DT1 would work as follows:

DT1[is.na(colB), colB := DT2[DT1[is.na(colB)], 
                    on = .(colC = colA, timeB1 <= timeA, timeB2 >= timeA), colD]]
print(DT1)
   colA colB timeA
1:    1    A     2
2:    1   YY     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4

This indexes the values where colB is NA and after a join on the condition, as defined in on= ..., replaces the missing values by the matching values found in colD.

这篇关于data.table 子集的条件 data.table 匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆