Data.table - 多个表的左外连接 [英] Data.table - left outer join on multiple tables

查看:27
本文介绍了Data.table - 多个表的左外连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你有类似的数据

fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))

setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")

fruits
   FruitID      Fruit
1:       1      Apple
2:       2     Banana
3:       3 Strawberry

colors
   ColorID FruitID  Color
1:       1       1    Red
2:       2       1 Yellow
3:       3       1  Green
4:       4       2 Yellow
5:       5       3    Red

tastes
   TasteID FruitID  Taste
1:       1       1 Sweeet
2:       2       1   Sour
3:       3       3  Sweet

我通常需要对这样的数据执行左外连接.例如,给我所有的水果和它们的颜色"需要我写(也许有更好的方法?)

I typically need to perform left-outer joins on data like this. For instance, "give me all fruits and their colors" requires me to write (and maybe there's a better way?)

setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")

这样一个简单而频繁的任务,三行代码似乎过多,所以我写了一个方法myLeftJoin

Three lines of code for such a simple and frequent task seemed excessive, so I wrote a method myLeftJoin

myLeftJoin <- function(tbl1, tbl2){
  # Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)

  oldkey <- key(tbl2)
  setkeyv(tbl2, key(tbl1))
  result <- tbl2[tbl1, allow.cartesian=TRUE]
  setkeyv(tbl2, oldkey)
  return(result)
}

我可以像这样使用

myLeftJoin(fruits, colors)
   ColorID FruitID  Color      Fruit
1:       1       1    Red      Apple
2:       2       1 Yellow      Apple
3:       3       1  Green      Apple
4:       4       2 Yellow     Banana
5:       5       3    Red Strawberry

如何扩展此方法,以便将任意数量的表传递给它并获取所有表的链式左外连接?类似于 myLeftJoin(tbl1, ...)

How can I extend this method so that I can pass any number of tables to it and get the chained left outer join of all of them? Something like myLeftJoin(tbl1, ...)

例如,我希望 myleftJoin(fruits, colors, tastes) 的结果等同于

For instance, I'd like the result of myleftJoin(fruits, colors, tastes) to be equivalent to

setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")

result
   TasteID FruitID  Taste ColorID  Color      Fruit
1:       1       1 Sweeet       1    Red      Apple
2:       2       1   Sour       1    Red      Apple
3:       1       1 Sweeet       2 Yellow      Apple
4:       2       1   Sour       2 Yellow      Apple
5:       1       1 Sweeet       3  Green      Apple
6:       2       1   Sour       3  Green      Apple
7:      NA       2     NA       4 Yellow     Banana
8:       3       3  Sweet       5    Red Strawberry

也许我错过了使用 data.table 包中的方法的优雅解决方案?谢谢

Perhaps there's an elegant solution using methods in the data.table package that I missed? Thanks

(修正了我的数据中的一个错误)

( Fixed a mistake in my data)

推荐答案

我刚刚在 data.table, v1.9.5,我们可以通过它加入而不需要设置键(即直接指定要加入的列,而不必使用setkey() 先):

有了它,这很简单:

require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
#    FruitID      Fruit TasteID  Taste ColorID  Color
# 1:       1      Apple       1 Sweeet       1    Red
# 2:       1      Apple       2   Sour       1    Red
# 3:       1      Apple       1 Sweeet       2 Yellow
# 4:       1      Apple       2   Sour       2 Yellow
# 5:       1      Apple       1 Sweeet       3  Green
# 6:       1      Apple       2   Sour       3  Green
# 7:       2         NA      NA     NA       4 Yellow
# 8:       3 Strawberry       3  Sweet       5    Red

这篇关于Data.table - 多个表的左外连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆