基于列名合并data.tables [英] merging data.tables based on columns names
问题描述
我试图做一些left-join与data.tables合并。
包描述引用
在所有连接中,列的名称是不相关的; x的键的列按顺序连接
我理解我可以使用 .data.table [
和 data.table ::: merge.data.table
像是:merge X和Y指定键(例如在基本合并中的by.x和by.y, - >为什么取消这个键。)
DT = data.table(x = rep(c(a,b,c), = 3),y = c(1,3,6),v = 1:9,key =x,y,v)
DT1 = data.frame(x1 = c(aa bb,cc),y1 = c(1,3,6),v1 = 1:3,key =x1,y1,v1)
我想要这个输出:
#data.table: :: merge is masking我不知道如何调用合并的基本版本
R){base :: merge}(DT,DT1,by.x =y,by.y =y1 )
yxv x1 v1
1 1 a 1 aa 1
2 1 c 7 aa 1
3 1 b 4 aa 1
4 3 a 2 bb 2
5 3 b 5 bb 2
6 3 c 8 bb 2
7 6 b 6 cc 3
8 6 a 3 cc 3
9 6 c 9 cc 3
我很高兴使用 [
c $ c> data.table ::: merge 但我想要一个不修改 DT
或 DT1的选项
(例如更改列名称并调用merge并将其改回)
:由于 data.table v1.9.6(2015年9月19日发布), merge.data.table()
很好地处理参数 by.x =
和 by.y =
。 这是一个更新的链接到下面引用的FR(现已关闭)。
是的,此功能请求尚未实现:
a href =https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2033&group_id=240&atid=978 =nofollow noreferrer> FR#2033将by.x和by.y添加到merge.data.table
没有任何东西阻止它。只是一些没有做到的。我很少需要 merge
,并且更慢地实现其一般的有用性。我们在使 merge
的性能与 X [Y]
一样快,并且此功能请求在最高优先级。如果你想要更快,你欢迎将这些参数添加到 merge.data.table
并自行提交更改。我们尝试将源代码缩短并集中在一个函数/文件中,因此通过查看 merge.data.table
源,希望您可以关注它,看看需要做什么。
I am trying to do some left-join merges with data.tables. The package description quote that
In all joins the names of the columns are irrelevant; the columns of x's key are joined to in order
I understand that I can use .data.table[
and data.table:::merge.data.table
What I would like is : merge X and Y specifying the keys (like by.x and by.y in base merge, ->why taking this away ?)
Let's suppose I have
DT = data.table(x=rep(c("a","b","c"),each=3),y=c(1,3,6),v=1:9,key="x,y,v")
DT1 = data.frame(x1=c("aa","bb","cc"),y1=c(1,3,6),v1=1:3,key="x1,y1,v1")
and I would like this output:
#data.table:::merge is masking I don't know how to call the base version of merge anymore
R) {base::merge}(DT,DT1,by.x="y",by.y="y1")
y x v x1 v1
1 1 a 1 aa 1
2 1 c 7 aa 1
3 1 b 4 aa 1
4 3 a 2 bb 2
5 3 b 5 bb 2
6 3 c 8 bb 2
7 6 b 6 cc 3
8 6 a 3 cc 3
9 6 c 9 cc 3
I am very happy to use [
or data.table:::merge
but I would like an option that do not modify DT
or DT1
(like changing the column names and calling merge and changing it back)
Update: Since data.table v1.9.6 (released September 19, 2015), merge.data.table()
does accept and nicely handles arguments by.x=
and by.y=
. Here's an updated link to the FR (now closed) referenced below.
Yes this is a feature request not yet implemented :
FR#2033 Add by.x and by.y to merge.data.table
There isn't anything preventing it. Just something that wasn't done. I very rarely need merge
and was slow to realise its usefulness more generally. We've made good progress in bringing merge
performance as fast as X[Y]
, and this feature request is at the highest priority. If you'd like it more quickly you are more than welcome to add those arguments to merge.data.table
and commit the change yourself. We try to keep source code short and together in one function/file, so by looking at merge.data.table
source hopefully you can follow it and see what needs to be done.
这篇关于基于列名合并data.tables的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!