一对多左连接/合并在Data.Table中 [英] One-to-Many Left Join / Merge in Data.Table in R

查看:204
本文介绍了一对多左连接/合并在Data.Table中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你知道在别的地方回答这个问题,请让我知道,但我搜索并阅读了大量的data.table相关的连接主题以及包的介绍和常见问题,仍然没有找到回答。基本上我正在寻找一个像一个灵活的左连接,我会设想直接的键,A [B]会做的伎俩,但它不是。

If you know of an answer to this question elsewhere, please let me know, but I searched and read a large number of data.table-related join topics as well as the package's intro and FAQ and still haven't found an answer. Essentially I am looking for something like a flexible left join and I would have thought setting the keys directly that A[B] would do the trick, but it does not.

基本上我有两个数据集,A和B(下面的例子)。我想添加数据在A到B基于匹配的ID和年份,导致类似C.有一些东西,区别于我看到的例子。首先,它的一对多基于多个键(这里id和year)。第二,我不想要所有可能的组合,意味着如果A中的信息在B中不需要,那么不添加。第三,我想保存B中A中的数据不可用(因此不仅仅是交集)。

Essentially I have two data sets, A and B (example below). I would like to add the data in A to B based on matching their ids and years, resulting in something like C. There are a few things that distinguish this from the examples I've seen. First, its one to many based on multiple keys (here id and year). Second, I don't want all of the possible combinations, meaning if information in A is not needed in B then that is not added. Third, I'd like to preserve B where data in A is not available (so not just the intersection).

再次,我想到了类似在A中设置键和B是id和年,然后C = A [B]会给我答案,但我得到一个错误,说在结果中有许多(许多,更多)行,触发笛卡尔警告。但是,结果应该是B的确切行数,所以它不清楚我做错了什么。

Again, I thought something like setting the keys in A and B to be id and year, then C = A[B] would give me the answer, but I get an error saying that many (many, many) more rows are in the result, triggering the Cartesian warning. However, the result should have the exact number of rows as B, so its unclear to me what I am doing wrong.

任何建议或见解都非常感谢! p>

Any advice or insights are much appreciated!

A
#    id   year   var1    var2
#    1    2010    10     20
#    1    2011    70     20
#    1    2012    90     30
#    1    2013    40     50 
#    2    2010    30     30
#    2    2011    50     60
#    2    2012    80     10
#    2    2013    90     80 
#    3    2010    50     50
#    3    2011    20     40
#    3    2012    90     30
#    3    2013    60     70 


B
#    id2  year    id    var3
#    1    2010    1     20
#    1    2011    1     20
#    1    2012    3     30
#    1    2013    4     50 
#    2    2010    1     30
#    2    2011    1     60
#    2    2012    1     10
#    2    2013    2     80 
#    3    2010    3     50
#    3    2011    2     40
#    3    2012    3     30
#    3    2013    2     70 

C
#    id2  year    id    var3   var1   var2
#    1    2010    1     20      10     20
#    1    2011    1     20      70     20
#    1    2012    3     30      90     30
#    1    2013    4     50      NA     NA 
#    2    2010    1     30      10     20
#    2    2011    1     60      70     20
#    2    2012    1     10      90     30
#    2    2013    2     80      90     80 
#    3    2010    3     50      50     50
#    3    2011    2     40      50     60
#    3    2012    3     30      90     30
#    3    2013    2     70      90     80  


推荐答案

您没有指出您希望看到的最终结果。
您有三个主要选项:

You did not indicate what you would like to see as the final result. You have three main options:

 A[B]
 B[A]
 merge(A, B, all=TRUE)

注意 cartesian 错误是你会看到,如果在加入表中有任何重复的键。在这种情况下,您可以简单地将标记设置为 TRUE 或改为删除重复项

Note that the allow.cartesian error is something you will see if there is any duplicated keys in the joining table. In which case you can simply set the flag to TRUE or instead remove the duplicates

 B[unique(A)]
 # or
 B[A, allow=TRUE]







setkey(A, id, year)
setkey(B, id, year)


A[B]
#     id year var1 var2 id2 var3
#  1:  1 2010   10   20   1   20
#  2:  1 2010   10   20   2   30
#  3:  1 2011   70   20   1   20
#  4:  1 2011   70   20   2   60
#  5:  1 2012   90   30   2   10
#  6:  2 2011   50   60   3   40
#  7:  2 2013   90   80   2   80
#  8:  2 2013   90   80   3   70
#  9:  3 2010   50   50   3   50
# 10:  3 2012   90   30   1   30
# 11:  3 2012   90   30   3   30
# 12:  4 2013   NA   NA   1   50

B[A, allow=TRUE]
#     id year id2 var3 var1 var2
#  1:  1 2010   1   20   10   20
#  2:  1 2010   2   30   10   20
#  3:  1 2011   1   20   70   20
#  4:  1 2011   2   60   70   20
#  5:  1 2012   2   10   90   30
#  6:  1 2013  NA   NA   40   50
#  7:  2 2010  NA   NA   30   30
#  8:  2 2011   3   40   50   60
#  9:  2 2012  NA   NA   80   10
# 10:  2 2013   2   80   90   80
# 11:  2 2013   3   70   90   80
# 12:  3 2010   3   50   50   50
# 13:  3 2011  NA   NA   20   40
# 14:  3 2012   1   30   90   30
# 15:  3 2012   3   30   90   30
# 16:  3 2013  NA   NA   60   70

merge(A, B, all=TRUE, allow=TRUE)
#     id year var1 var2 id2 var3
#  1:  1 2010   10   20   1   20
#  2:  1 2010   10   20   2   30
#  3:  1 2011   70   20   1   20
#  4:  1 2011   70   20   2   60
#  5:  1 2012   90   30   2   10
#  6:  1 2013   40   50  NA   NA
#  7:  2 2010   30   30  NA   NA
#  8:  2 2011   50   60   3   40
#  9:  2 2012   80   10  NA   NA
# 10:  2 2013   90   80   2   80
# 11:  2 2013   90   80   3   70
# 12:  3 2010   50   50   3   50
# 13:  3 2011   20   40  NA   NA
# 14:  3 2012   90   30   1   30
# 15:  3 2012   90   30   3   30
# 16:  3 2013   60   70  NA   NA
# 17:  4 2013   NA   NA   1   50

这篇关于一对多左连接/合并在Data.Table中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆