一对多左连接/合并在Data.Table中 [英] One-to-Many Left Join / Merge in Data.Table in R
问题描述
如果你知道在别的地方回答这个问题,请让我知道,但我搜索并阅读了大量的data.table相关的连接主题以及包的介绍和常见问题,仍然没有找到回答。基本上我正在寻找一个像一个灵活的左连接,我会设想直接的键,A [B]会做的伎俩,但它不是。
If you know of an answer to this question elsewhere, please let me know, but I searched and read a large number of data.table-related join topics as well as the package's intro and FAQ and still haven't found an answer. Essentially I am looking for something like a flexible left join and I would have thought setting the keys directly that A[B] would do the trick, but it does not.
基本上我有两个数据集,A和B(下面的例子)。我想添加数据在A到B基于匹配的ID和年份,导致类似C.有一些东西,区别于我看到的例子。首先,它的一对多基于多个键(这里id和year)。第二,我不想要所有可能的组合,意味着如果A中的信息在B中不需要,那么不添加。第三,我想保存B中A中的数据不可用(因此不仅仅是交集)。
Essentially I have two data sets, A and B (example below). I would like to add the data in A to B based on matching their ids and years, resulting in something like C. There are a few things that distinguish this from the examples I've seen. First, its one to many based on multiple keys (here id and year). Second, I don't want all of the possible combinations, meaning if information in A is not needed in B then that is not added. Third, I'd like to preserve B where data in A is not available (so not just the intersection).
再次,我想到了类似在A中设置键和B是id和年,然后C = A [B]会给我答案,但我得到一个错误,说在结果中有许多(许多,更多)行,触发笛卡尔警告。但是,结果应该是B的确切行数,所以它不清楚我做错了什么。
Again, I thought something like setting the keys in A and B to be id and year, then C = A[B] would give me the answer, but I get an error saying that many (many, many) more rows are in the result, triggering the Cartesian warning. However, the result should have the exact number of rows as B, so its unclear to me what I am doing wrong.
任何建议或见解都非常感谢! p>
Any advice or insights are much appreciated!
A
# id year var1 var2
# 1 2010 10 20
# 1 2011 70 20
# 1 2012 90 30
# 1 2013 40 50
# 2 2010 30 30
# 2 2011 50 60
# 2 2012 80 10
# 2 2013 90 80
# 3 2010 50 50
# 3 2011 20 40
# 3 2012 90 30
# 3 2013 60 70
B
# id2 year id var3
# 1 2010 1 20
# 1 2011 1 20
# 1 2012 3 30
# 1 2013 4 50
# 2 2010 1 30
# 2 2011 1 60
# 2 2012 1 10
# 2 2013 2 80
# 3 2010 3 50
# 3 2011 2 40
# 3 2012 3 30
# 3 2013 2 70
C
# id2 year id var3 var1 var2
# 1 2010 1 20 10 20
# 1 2011 1 20 70 20
# 1 2012 3 30 90 30
# 1 2013 4 50 NA NA
# 2 2010 1 30 10 20
# 2 2011 1 60 70 20
# 2 2012 1 10 90 30
# 2 2013 2 80 90 80
# 3 2010 3 50 50 50
# 3 2011 2 40 50 60
# 3 2012 3 30 90 30
# 3 2013 2 70 90 80
推荐答案
您没有指出您希望看到的最终结果。
您有三个主要选项:
You did not indicate what you would like to see as the final result. You have three main options:
A[B]
B[A]
merge(A, B, all=TRUE)
注意 cartesian
错误是你会看到,如果在加入表中有任何重复的键。在这种情况下,您可以简单地将标记设置为 TRUE
或改为删除重复项
Note that the allow.cartesian
error is something you will see if there is any duplicated keys in the joining table. In which case you can simply set the flag to TRUE
or instead remove the duplicates
B[unique(A)]
# or
B[A, allow=TRUE]
setkey(A, id, year)
setkey(B, id, year)
A[B]
# id year var1 var2 id2 var3
# 1: 1 2010 10 20 1 20
# 2: 1 2010 10 20 2 30
# 3: 1 2011 70 20 1 20
# 4: 1 2011 70 20 2 60
# 5: 1 2012 90 30 2 10
# 6: 2 2011 50 60 3 40
# 7: 2 2013 90 80 2 80
# 8: 2 2013 90 80 3 70
# 9: 3 2010 50 50 3 50
# 10: 3 2012 90 30 1 30
# 11: 3 2012 90 30 3 30
# 12: 4 2013 NA NA 1 50
B[A, allow=TRUE]
# id year id2 var3 var1 var2
# 1: 1 2010 1 20 10 20
# 2: 1 2010 2 30 10 20
# 3: 1 2011 1 20 70 20
# 4: 1 2011 2 60 70 20
# 5: 1 2012 2 10 90 30
# 6: 1 2013 NA NA 40 50
# 7: 2 2010 NA NA 30 30
# 8: 2 2011 3 40 50 60
# 9: 2 2012 NA NA 80 10
# 10: 2 2013 2 80 90 80
# 11: 2 2013 3 70 90 80
# 12: 3 2010 3 50 50 50
# 13: 3 2011 NA NA 20 40
# 14: 3 2012 1 30 90 30
# 15: 3 2012 3 30 90 30
# 16: 3 2013 NA NA 60 70
merge(A, B, all=TRUE, allow=TRUE)
# id year var1 var2 id2 var3
# 1: 1 2010 10 20 1 20
# 2: 1 2010 10 20 2 30
# 3: 1 2011 70 20 1 20
# 4: 1 2011 70 20 2 60
# 5: 1 2012 90 30 2 10
# 6: 1 2013 40 50 NA NA
# 7: 2 2010 30 30 NA NA
# 8: 2 2011 50 60 3 40
# 9: 2 2012 80 10 NA NA
# 10: 2 2013 90 80 2 80
# 11: 2 2013 90 80 3 70
# 12: 3 2010 50 50 3 50
# 13: 3 2011 20 40 NA NA
# 14: 3 2012 90 30 1 30
# 15: 3 2012 90 30 3 30
# 16: 3 2013 60 70 NA NA
# 17: 4 2013 NA NA 1 50
这篇关于一对多左连接/合并在Data.Table中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!