连接两个数据表并仅使用第二个 dt 中的一列 [英] Join two data tables and use only one column from second dt
问题描述
假设我有两个数据表(dt1 和 dt2),我想使用数据表获取 dt3.A、B、C、E、F、G、H 是列名.dt1 key 是A 列,dt2 key 是E 列.数据表有不同的行数.我想保留 DT1 中的所有列,并仅将 DT2 中的一列 (H) 添加到连接的数据表中.最终,我会将其存储为 DT1(尽管我在下面将其显示为 dt3).
如何用数据表来实现?我有一个带有合并 + 数据框的丑陋解决方案.
dt1乙丙1 4 72 5 83 6 92 20 21dt2E F G H1 10 13 163 12 15 182 11 14 17dt3A B C H1 4 7 162 5 8 173 6 9 182 20 21 17
为了对 df1
执行 left join 并添加 H
df2
中的列,您可以将 binary join 与 update by reference 运算符 (:=
) 结合起来
setkey(setDT(dt1), A)dt1[dt2, H := i.H]
在开发版本 (v >= 1.9.5) 中,我们可以通过在 setDT
中指定 key
(如@Arun 所指出的)来使其更短>
setDT(dt1, key = "A")[dt2, H := i.H]
<小时>
编辑 24/7/2015
您现在可以使用新的 on
参数运行二进制连接,而无需设置键
setDT(dt1)[dt2, H := i.H, on = c(A = "E")]
Let's say I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A,B,C,E,F,G,H are column names. dt1 key is column A, and dt2 key is column E. Data tables have different number of rows. I want to keep all the columns from DT1, and add only one column (H) from DT2 to the joined data table. Eventually, I will store this as DT1 (though I showed it as dt3 below).
How can I achieve it with data tables? I have an ugly solution with merge + data frames.
dt1
A B C
1 4 7
2 5 8
3 6 9
2 20 21
dt2
E F G H
1 10 13 16
3 12 15 18
2 11 14 17
dt3
A B C H
1 4 7 16
2 5 8 17
3 6 9 18
2 20 21 17
In order to perform a left join to df1
and add H
column from df2
, you can combine binary join with the update by reference operator (:=
)
setkey(setDT(dt1), A)
dt1[dt2, H := i.H]
See here and here for detailed explanation on how it works
With the devel version (v >= 1.9.5) we could make it even shorter by specifying the key
within setDT
(as pointed by @Arun)
setDT(dt1, key = "A")[dt2, H := i.H]
Edit 24/7/2015
You can now run a binary join using the new on
parameter without setting keys
setDT(dt1)[dt2, H := i.H, on = c(A = "E")]
这篇关于连接两个数据表并仅使用第二个 dt 中的一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!