左外连接具有不同名称的关键变量的data.table [英] left outer join with data.table with different names for key variables

查看:101
本文介绍了左外连接具有不同名称的关键变量的data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我先说说这是我在堆栈溢出发布的第一个问题。请让我知道如果我需要在我的问题中改变样式,格式等。

Let me first start by saying that this is the first question that I am posting on stack overflow. Please let me know if I need to change the style,format etc in my questions.

我想对两个数据表执行一个左外连接操作条件,我允许在两个数据表中的关键变量有不同的名称。示例:

I would like to do a left outer join operation on two data tables with an exra condition that I be allowed to have different names for the key variables in the two data tables. Example:

DT1 = data.table(x1=c("b","c", "a", "b", "a", "b"),   x2a=1:6,m1=seq(10,60,by=10))
setkey(DT1,x1,x2a)
> DT1
   x1 x2a m1
1:  a   3 30
2:  a   5 50
3:  b   1 10
4:  b   4 40
5:  b   6 60
6:  c   2 20
DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2b=c(1,4,7,6," "," "),m2=5:10)
setkey(DT2,x1,x2b)
> DT2
   x1 x2b m2
1:  a      9
2:  a     10
3:  b   1  5
4:  b   6  8
5:  c   7  7
6:  d   4  6
############# first, I use the merge operation on the data frames to do a left outer join
dfL<-merge.data.frame(DT1,DT2,by.x=c('x1','x2a'),by.y=c('x1','x2b'),all.x=TRUE)
> dfL
  x1 x2a m1 m2
1  a   3 30 NA
2  a   5 50 NA
3  b   1 10  5
4  b   4 40 NA
5  b   6 60  8
6  c   2 20 NA
################# attempt with data table left outer join 
> dtL<-DT2[DT1,on=c("x1","x2a")]
Error in forderv(x, by = rightcols) : 
  'by' value -2147483648 out of range [1,3]

#################### code that works with data table
DT1 = data.table(x1=c("b","c", "a", "b", "a", "b"), x2=as.character(1:6),m1=seq(10,60,by=10))
setkey(DT1,x1,x2)
DT1
DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2=c(1,4,7,6," "," ") ,m2=5:10)
setkey(DT2,x1,x2)
DT2
dtL<-DT2[DT1]
######################## this required identical naming of the key variables in the two data tables
################### Also does not allow a ad-hoc selection of the key variables with the "on" argument

我想知道是否可以保留merge命令与数据帧的灵活性。使用data.table。

I would like to know if it is possible to retain the flexibility of the merge command with data frames. With data.table.

推荐答案

?data.table :: merge


data.table的合并方法与data.frames的操作非常相似,但有一个主要例外:默认情况下,合并data.tables是共享键列,而不是具有相同名称的共享列。

This merge method for data.table behaves very similarly to that of data.frames with one major exception: By default, the columns used to merge the data.tables are the shared key columns rather than the shared columns with the same names. Set the by, or by.x, by.y arguments explicitly to override this default.

因此,我们可以使用通过参数覆盖键。

So we can use the by arguments to override the keys.

library(data.table)

DT1 = data.table(x1=c("b","c", "a", "b", "a", "b"),   x2a=1:6,m1=seq(10,60,by=10))
DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2b=c(1,4,7,6," "," "),m2=5:10)

## you will get an error when joining a character to a integer:
DT2$x2b <- as.integer(DT2$x2b)
## Alternative:
## DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2b=c(1,4,7,6,NA,NA),m2=5:10)

merge(DT1, DT2, by.x=c('x1','x2a'), by.y=c('x1','x2b'), all.x=TRUE)

   x1 x2a m1 m2
1:  a   3 30 NA
2:  a   5 50 NA
3:  b   1 10  5
4:  b   4 40 NA
5:  b   6 60  8
6:  c   2 20 NA

这篇关于左外连接具有不同名称的关键变量的data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆