使用存储在变量中的列名联接数据表 [英] Join datatables using column names stored in variables
问题描述
我有2个data.tables:
I have 2 data.tables:
library(data.table)
dt1 <- data.table(id = 1:5, value1 = 11:15, value2 = 21:25, value3 = 36:40)
dt2 <- data.table(name = c("value1", "value1", "value1", "value1",
"value2", "value2", "value2", "value3", "value3"),
valueMin = c(10, 13, 14, 18, 21, 24, 25, 36, 38),
valueMax = c(13, 14, 18, 20, 24, 25, 27, 38, 42),
label = c(101:104, 201:203, 301:302))
> dt1
id value1 value2 value3
1: 1 11 21 36
2: 2 12 22 37
3: 3 13 23 38
4: 4 14 24 39
5: 5 15 25 40
> dt2
name valueMin valueMax label
1: value1 10 13 101
2: value1 13 14 102
3: value1 14 18 103
4: value1 18 20 104
5: value2 21 24 201
6: value2 24 25 202
7: value2 25 27 203
8: value3 36 38 301
9: value3 38 42 302
我期望的结果如下:通过dt1
中的value1
在dt2
和dt2$name
中的valueMin和valueMax之间的事实将标签从dt2
耦合到dt1
).
这是我有的解决方案(给出正确的结果):
The result I expect is the following: joining label from dt2
to dt1
by the fact that value1
in dt1
is between valueMin and valueMax in dt2
and dt2$name
matches to value1
).
Here is a solution I have (gives correct result):
varName <- "value1"
dt2_temp <- dt2[name == varName,]
dt1[dt2_temp, on = .(value1 > valueMin, value1 <= valueMax), nomatch = 0] %>%
select(id, label)
id label
1: 1 101
2: 2 101
3: 3 101
4: 4 102
5: 5 103
我想对dt1
中的所有其余列(value2
,value3
)执行相同的操作(获取label
列)(使用循环),因此需要替换对列名varName
中的名称,例如:
I would like to do the same (get label
columns) for all the rest columns (value2
, value3
) in dt1
(using loop), therefore need to replace reference to column name value1
in join to it's name stored in varName
, something like:
dt1[dt2_temp, on = .(varName > valueMin, varName <= valueMax), nomatch = 0]
不幸的是,我没有成功使用:简单地varName
,eval(varName)
,as.name(varName)
.您有解决方法的想法吗?
Unfortunately, I did not succeed using: simply varName
, eval(varName)
, as.name(varName)
. Do you have an idea how to solve this?
错误消息类似于:
Error in `[.data.table`(dt1, dt2_temp, on = .(varName > valueMin, varName <= valueMax), :
Column(s) [varName,varName] not found in x
推荐答案
为什么不一without而就呢?
Why not do it all in one go without a loop?
可能的解决方案:
melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]
给出:
id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3
1: 1 11 21 36 101 NA NA
2: 2 12 22 37 101 201 301
3: 3 13 23 38 101 201 301
4: 4 14 24 39 102 201 302
5: 5 15 25 40 103 202 302
这篇关于使用存储在变量中的列名联接数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!