Data.table - 多个表的左外连接 [英] Data.table - left outer join on multiple tables
问题描述
假设你有类似的数据
fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))
setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")
fruits
FruitID Fruit
1: 1 Apple
2: 2 Banana
3: 3 Strawberry
colors
ColorID FruitID Color
1: 1 1 Red
2: 2 1 Yellow
3: 3 1 Green
4: 4 2 Yellow
5: 5 3 Red
tastes
TasteID FruitID Taste
1: 1 1 Sweeet
2: 2 1 Sour
3: 3 3 Sweet
我通常需要对这样的数据执行左外连接.例如,给我所有的水果和它们的颜色"需要我写(也许有更好的方法?)
I typically need to perform left-outer joins on data like this. For instance, "give me all fruits and their colors" requires me to write (and maybe there's a better way?)
setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")
这样一个简单而频繁的任务,三行代码似乎过多,所以我写了一个方法myLeftJoin
Three lines of code for such a simple and frequent task seemed excessive, so I wrote a method myLeftJoin
myLeftJoin <- function(tbl1, tbl2){
# Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)
oldkey <- key(tbl2)
setkeyv(tbl2, key(tbl1))
result <- tbl2[tbl1, allow.cartesian=TRUE]
setkeyv(tbl2, oldkey)
return(result)
}
我可以像这样使用
myLeftJoin(fruits, colors)
ColorID FruitID Color Fruit
1: 1 1 Red Apple
2: 2 1 Yellow Apple
3: 3 1 Green Apple
4: 4 2 Yellow Banana
5: 5 3 Red Strawberry
如何扩展此方法,以便将任意数量的表传递给它并获取所有表的链式左外连接?类似于 myLeftJoin(tbl1, ...)
How can I extend this method so that I can pass any number of tables to it and get the chained left outer join of all of them? Something like myLeftJoin(tbl1, ...)
例如,我希望 myleftJoin(fruits, colors, tastes)
的结果等同于
For instance, I'd like the result of myleftJoin(fruits, colors, tastes)
to be equivalent to
setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")
result
TasteID FruitID Taste ColorID Color Fruit
1: 1 1 Sweeet 1 Red Apple
2: 2 1 Sour 1 Red Apple
3: 1 1 Sweeet 2 Yellow Apple
4: 2 1 Sour 2 Yellow Apple
5: 1 1 Sweeet 3 Green Apple
6: 2 1 Sour 3 Green Apple
7: NA 2 NA 4 Yellow Banana
8: 3 3 Sweet 5 Red Strawberry
也许我错过了使用 data.table 包中的方法的优雅解决方案?谢谢
Perhaps there's an elegant solution using methods in the data.table package that I missed? Thanks
(修正了我的数据中的一个错误)
( Fixed a mistake in my data)
推荐答案
我刚刚在 data.table, v1.9.5
,我们可以通过它加入而不需要设置键(即直接指定要加入的列,而不必使用setkey()
先):
有了它,这很简单:
require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
# FruitID Fruit TasteID Taste ColorID Color
# 1: 1 Apple 1 Sweeet 1 Red
# 2: 1 Apple 2 Sour 1 Red
# 3: 1 Apple 1 Sweeet 2 Yellow
# 4: 1 Apple 2 Sour 2 Yellow
# 5: 1 Apple 1 Sweeet 3 Green
# 6: 1 Apple 2 Sour 3 Green
# 7: 2 NA NA NA 4 Yellow
# 8: 3 Strawberry 3 Sweet 5 Red
这篇关于Data.table - 多个表的左外连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!