条件data.table匹配data.table的子集 [英] conditional data.table match for subset of data.table

查看:125
本文介绍了条件data.table匹配data.table的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此帖子与此处的前一篇帖子有关:

This post is related to the previous post here: match rows of two data.tables to fill subset of a data.table

不确定如何将它们集成在一起. 我遇到的情况是,除了DT1的一列的NA以外,还应该申请其他几个条件,但这是行不通的.

Not sure how I can integrate them together. I have a situation where other than the NA for one column of DT1, a couple of more conditions should apply for merging, but that doesn't work.

> DT1 <- data.table(colA = c(1,1, 2,2,2,3,3), colB = c('A', NA, 'AA', 'B', NA, 'A', 'C'), timeA = c(2,4,3,4,6,1,4))
> DT1
   colA colB timeA
1:    1    A     2
2:    1 <NA>     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4
> DT2 <- data.table(colC = c(1,1,1,2,2,3), timeB1 = c(1,3,6, 2,4, 1), timeB2 = c(2,5,7,3,5,4), colD = c('Z', 'YY', 'AB', 'JJ', 'F', 'RR'))
> DT2
   colC timeB1 timeB2 colD
1:    1      1      2    Z
2:    1      3      5   YY
3:    1      6      7   AB
4:    2      2      3   JJ
5:    2      4      5    F
6:    3      1      4   RR

使用与上述相同的准则,我只想将DT1中colB的NA值与DT2的ColD合并到DT1的colB中,并使用DT1中的timeA在DT2中的timeB1和timeB2之间的colD值.我尝试了以下操作,但合并没有发生:

Using the same guideline as mentioned above, I'd like to merge ColD of DT2 to colB of DT1 only for NA values of colB in DT1 AND use the values of colD for which timeA in DT1 is between timeB1 and timeB2 in DT2. I tried the following but merge doesn't happen:

 > output <- DT1[DT2, on = .(colA = colC), colB := ifelse(is.na(x.colB) & i.timeB1 <= x.timeA & x.timeA <= i.timeB2, i.colD, x.colB)]
> output
> output
   colA colB timeA
1:    1    A     2
2:    1 <NA>     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4

输出没有任何变化. 这是我想要的输出:

Nothing changes in output. these is my desired output:

> desired_output
   colA colB timeA
1:    1    A     2
2:    1   YY     4   --> should find a match
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6   --> shouldn't find a match
6:    3    A     1
7:    3    C     4

为什么这行不通? 我只想使用data.table操作,而不使用其他程序包.

why doesn't this work? I'd like to use data.table operations only without using additional packages.

推荐答案

DT1中对colB进行就地更新的方式如下:

An in place update of the colB in DT1 would work as follows:

DT1[is.na(colB), colB := DT2[DT1[is.na(colB)], 
                    on = .(colC = colA, timeB1 <= timeA, timeB2 >= timeA), colD]]
print(DT1)
   colA colB timeA
1:    1    A     2
2:    1   YY     4
3:    2   AA     3
4:    2    B     4
5:    2 <NA>     6
6:    3    A     1
7:    3    C     4

此操作会为colBNA的值建立索引,并在按on= ...定义的条件联接之后,将缺失的值替换为colD中找到的匹配值.

This indexes the values where colB is NA and after a join on the condition, as defined in on= ..., replaces the missing values by the matching values found in colD.

这篇关于条件data.table匹配data.table的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆