连接两个数据表,仅使用第二个dt中的一列 [英] Join two data tables and use only one column from second dt

查看:108
本文介绍了连接两个数据表,仅使用第二个dt中的一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我有两个数据表(dt1和dt2),我想使用数据表获取dt3. A,B,C,E,F,G,H是列名. dt1键是列A,而dt2键是列E.数据表具有不同的行数.我想保留DT1中的所有列,并仅将DT2中的一列(H)添加到联接的数据表中.最终,我将其存储为DT1(尽管我在下面将其显示为dt3).

如何使用数据表实现它?我有一个合并+数据帧的丑陋解决方案.

dt1 
A   B   C   
1   4   7   
2   5   8   
3   6   9   
2   20  21

dt2
E   F   G   H
1   10  13  16
3   12  15  18    
2   11  14  17


dt3
A   B   C   H
1   4   7   16
2   5   8   17
3   6   9   18
2   20  21  17          

解决方案

为了对df1执行左连接并从df2添加H列,您可以将二进制连接按引用更新运算符(:=)

setkey(setDT(dt1), A) 
dt1[dt2, H := i.H]

此处和<请在href ="https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-reference-semantics.html" rel ="noreferrer">此处详细了解其工作原理


使用开发版本(v> = 1.9.5),我们可以通过在setDT中指定key(由@Arun指出)来使其更短

setDT(dt1, key = "A")[dt2, H := i.H]


编辑2015年7月24日

您现在可以使用新的on参数运行二进制联接,而无需设置键

setDT(dt1)[dt2, H := i.H, on = c(A = "E")]

Let's say I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A,B,C,E,F,G,H are column names. dt1 key is column A, and dt2 key is column E. Data tables have different number of rows. I want to keep all the columns from DT1, and add only one column (H) from DT2 to the joined data table. Eventually, I will store this as DT1 (though I showed it as dt3 below).

How can I achieve it with data tables? I have an ugly solution with merge + data frames.

dt1 
A   B   C   
1   4   7   
2   5   8   
3   6   9   
2   20  21

dt2
E   F   G   H
1   10  13  16
3   12  15  18    
2   11  14  17


dt3
A   B   C   H
1   4   7   16
2   5   8   17
3   6   9   18
2   20  21  17          

解决方案

In order to perform a left join to df1 and add H column from df2, you can combine binary join with the update by reference operator (:=)

setkey(setDT(dt1), A) 
dt1[dt2, H := i.H]

See here and here for detailed explanation on how it works


With the devel version (v >= 1.9.5) we could make it even shorter by specifying the key within setDT (as pointed by @Arun)

setDT(dt1, key = "A")[dt2, H := i.H]


Edit 24/7/2015

You can now run a binary join using the new on parameter without setting keys

setDT(dt1)[dt2, H := i.H, on = c(A = "E")]

这篇关于连接两个数据表,仅使用第二个dt中的一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆