R:* _ join的标准评估(dplyr) [英] R: Standard evaluation for *_join (dplyr)

查看:58
本文介绍了R:* _ join的标准评估(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当连接变量名称不同并存储在另一个变量中时,如何使用dplyr的* _join()连接2个表?

How to join 2 tables using *_join() from dplyr, when the join variable names are different and stored in another variable?

eg

df1 = data_frame(x1 = 1:10,y1 = 21:30)
df2 = data_frame(x2 = 6:15,y2 = 26:35)
df3 = data_frame(x1 = 6:15,y2 = 26:35)

var1 = "x1"
var2 = "x2"

df1 %>% left_join(df3,by=c(var1)) # #1 works

但这会导致错误-

df1 %>% left_join(df2,by=c(var1 = var2)) # #2 doesn't work
Error: cannot join on columns 'x2' x 'var1': index out of bounds

令人惊讶的是,这有效-

surprisingly, this works -

df1 %>% left_join(df2,by=c("x1" = var2)) # #3 works


推荐答案

这里的问题是,如果公共列在data.frames中具有不同的名称,则必须提供命名的矢量。看看您的示例中发生了什么:

The problem here is that you have to supply a named vector if the common column has different names in the data.frames. See what happens in your example:

当您直接提供名称时,它会起作用:

It works when you supply the names directly:

df1 %>% left_join(df2, by = c("x1" = "x2"))
#Source: local data frame [10 x 3]
#
#   x1 y1 y2
#1   1 21 NA
#2   2 22 NA
#3   3 23 NA
#4   4 24 NA
#5   5 25 NA
#6   6 26 26
#7   7 27 27
#8   8 28 28
#9   9 29 29
#10 10 30 30

您提供的命名向量为:

c("x1" = "x2")
#  x1 
#"x2" 

现在,如果您使用字符向量,则命名的向量将更改为:

Now if you use character vectors, the named vector changes to:

var1 = "x1"
var2 = "x2"

c(var1 = var2)
#var1             # <~~ this is why it doesn't work
#"x2"

我不知道目前在dplyr中是否有干净的方法可以解决此问题。一种变通方法是根据需要进行以下调整以构造命名向量:

I don't know if there's a "clean" way to solve this in dplyr currently. A workaround is the following tweak to construct the named vector as required:

df1 %>% left_join(df2, by = setNames(var2, var1))
#Source: local data frame [10 x 3]
#
#   x1 y1 y2
#1   1 21 NA
#2   2 22 NA
#3   3 23 NA
#4   4 24 NA
#5   5 25 NA
#6   6 26 26
#7   7 27 27
#8   8 28 28
#9   9 29 29
#10 10 30 30

之所以起作用,是因为:

This works because:

setNames(var2, var1)
#  x1 
#"x2" 

希望它会有所帮助。

注意:您可以对名称<-进行相同操作,例如:

Note: you could do the same with names<- like so:

df1 %>% left_join(df2, by = `names<-`(var2, var1))

,但Hadl​​ey建议改用 setNames 方法。

but Hadley recommends using the setNames approach instead.

这篇关于R:* _ join的标准评估(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆