基于列名合并data.tables [英] merging data.tables based on columns names

查看:117
本文介绍了基于列名合并data.tables的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做一些left-join与data.tables合并。
包描述引用


在所有连接中,列的名称是不相关的; x的键的列按顺序连接


我理解我可以使用 .data.table [ data.table ::: merge.data.table



像是:merge X和Y指定键(例如在基本合并中的by.x和by.y, - >为什么取消这个键。)



  DT = data.table(x = rep(c(a,b,c), = 3),y = c(1,3,6),v = 1:9,key =x,y,v)
DT1 = data.frame(x1 = c(aa bb,cc),y1 = c(1,3,6),v1 = 1:3,key =x1,y1,v1)

我想要这个输出:

 #data.table: :: merge is masking我不知道如何调用合并的基本版本
R){base :: merge}(DT,DT1,by.x =y,by.y =y1 )
yxv x1 v1
1 1 a 1 aa 1
2 1 c 7 aa 1
3 1 b 4 aa 1
4 3 a 2 bb 2
5 3 b 5 bb 2
6 3 c 8 bb 2
7 6 b 6 cc 3
8 6 a 3 cc 3
9 6 c 9 cc 3

我很高兴使用 [ c $ c> data.table ::: merge 但我想要一个不修改 DT DT1的选项(例如更改列名称并调用merge并将其改回)

解决方案

:由于 data.table v1.9.6(2015年9月19日发布), merge.data.table()很好地处理参数 by.x = by.y = 这是一个更新的链接到下面引用的FR(现已关闭)。






是的,此功能请求尚未实现:



a href =https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2033&group_id=240&atid=978 =nofollow noreferrer> FR#2033将by.x和by.y添加到merge.data.table



没有任何东西阻止它。只是一些没有做到的。我很少需要 merge ,并且更慢地实现其一般的有用性。我们在使 merge 的性能与 X [Y] 一样快,并且此功能请求在最高优先级。如果你想要更快,你欢迎将这些参数添加到 merge.data.table 并自行提交更改。我们尝试将源代码缩短并集中在一个函数/文件中,因此通过查看 merge.data.table 源,希望您可以关注它,看看需要做什么。


I am trying to do some left-join merges with data.tables. The package description quote that

In all joins the names of the columns are irrelevant; the columns of x's key are joined to in order

I understand that I can use .data.table[ and data.table:::merge.data.table

What I would like is : merge X and Y specifying the keys (like by.x and by.y in base merge, ->why taking this away ?)

Let's suppose I have

DT = data.table(x=rep(c("a","b","c"),each=3),y=c(1,3,6),v=1:9,key="x,y,v")
DT1 = data.frame(x1=c("aa","bb","cc"),y1=c(1,3,6),v1=1:3,key="x1,y1,v1")

and I would like this output:

#data.table:::merge is masking I don't know how to call the base version of merge anymore
R) {base::merge}(DT,DT1,by.x="y",by.y="y1") 
y x v x1 v1
1 1 a 1 aa  1
2 1 c 7 aa  1
3 1 b 4 aa  1
4 3 a 2 bb  2
5 3 b 5 bb  2
6 3 c 8 bb  2
7 6 b 6 cc  3
8 6 a 3 cc  3
9 6 c 9 cc  3

I am very happy to use [ or data.table:::merge but I would like an option that do not modify DT or DT1 (like changing the column names and calling merge and changing it back)

解决方案

Update: Since data.table v1.9.6 (released September 19, 2015), merge.data.table() does accept and nicely handles arguments by.x= and by.y=. Here's an updated link to the FR (now closed) referenced below.


Yes this is a feature request not yet implemented :

FR#2033 Add by.x and by.y to merge.data.table

There isn't anything preventing it. Just something that wasn't done. I very rarely need merge and was slow to realise its usefulness more generally. We've made good progress in bringing merge performance as fast as X[Y], and this feature request is at the highest priority. If you'd like it more quickly you are more than welcome to add those arguments to merge.data.table and commit the change yourself. We try to keep source code short and together in one function/file, so by looking at merge.data.table source hopefully you can follow it and see what needs to be done.

这篇关于基于列名合并data.tables的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆