使用存储在变量中的列名联接数据表 [英] Join datatables using column names stored in variables

查看:91
本文介绍了使用存储在变量中的列名联接数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个data.tables:

I have 2 data.tables:

library(data.table)
dt1 <- data.table(id = 1:5, value1 = 11:15, value2 = 21:25, value3 = 36:40)
dt2 <- data.table(name = c("value1", "value1", "value1", "value1", 
                            "value2", "value2", "value2", "value3", "value3"), 
              valueMin = c(10, 13, 14, 18, 21, 24, 25, 36, 38), 
              valueMax = c(13, 14, 18, 20, 24, 25, 27, 38, 42), 
              label = c(101:104, 201:203, 301:302))
> dt1
   id value1 value2 value3
1:  1     11     21     36
2:  2     12     22     37
3:  3     13     23     38
4:  4     14     24     39
5:  5     15     25     40
> dt2
     name valueMin valueMax label
1: value1       10       13   101
2: value1       13       14   102
3: value1       14       18   103
4: value1       18       20   104
5: value2       21       24   201
6: value2       24       25   202
7: value2       25       27   203
8: value3       36       38   301
9: value3       38       42   302

我期望的结果如下:通过dt1中的value1dt2dt2$name中的valueMin和valueMax之间的事实将标签从dt2耦合到dt1 ). 这是我有的解决方案(给出正确的结果):

The result I expect is the following: joining label from dt2 to dt1 by the fact that value1 in dt1 is between valueMin and valueMax in dt2 and dt2$name matches to value1). Here is a solution I have (gives correct result):

varName <- "value1"
dt2_temp <- dt2[name == varName,]
dt1[dt2_temp, on = .(value1 > valueMin, value1 <= valueMax), nomatch = 0] %>%
select(id, label)
   id label
   1:  1   101
   2:  2   101
   3:  3   101
   4:  4   102
   5:  5   103

我想对dt1中的所有其余列(value2value3)执行相同的操作(获取label列)(使用循环),因此需要替换对列名连接到存储在varName中的名称,例如:

I would like to do the same (get label columns) for all the rest columns (value2, value3) in dt1 (using loop), therefore need to replace reference to column name value1 in join to it's name stored in varName, something like:

dt1[dt2_temp, on = .(varName > valueMin, varName <= valueMax), nomatch = 0]

不幸的是,我没有成功使用:简单地varNameeval(varName)as.name(varName).您有解决方法的想法吗?

Unfortunately, I did not succeed using: simply varName, eval(varName), as.name(varName). Do you have an idea how to solve this?

错误消息类似于:

Error in `[.data.table`(dt1, dt2_temp, on = .(varName > valueMin, varName <= valueMax),  : 
  Column(s) [varName,varName] not found in x

推荐答案

为什么不一without而就呢?

Why not do it all in one go without a loop?

可能的解决方案:

melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
                  ][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]

给出:

   id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3
1:  1           11           21           36        101         NA         NA
2:  2           12           22           37        101        201        301
3:  3           13           23           38        101        201        301
4:  4           14           24           39        102        201        302
5:  5           15           25           40        103        202        302

这篇关于使用存储在变量中的列名联接数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆