R:当两个条件之一匹配时,合并两个数据帧 [英] R: merge two data frames when either of two criteria matches
问题描述
说我有两个数据帧,如下所示:
Say I have two dataframes like the following:
n = c(2, 3, 5, 5, 6, 7)
s = c("aa", "bb", "cc", "dd", "ee", "ff")
b = c(2, 4, 5, 4, 3, 2)
df = data.frame(n, s, b)
# n s b
#1 2 aa 2
#2 3 bb 4
#3 5 cc 5
#4 5 dd 4
#5 6 ee 3
#6 7 ff 2
n2 = c(5, 6, 7, 6)
s2 = c("aa", "bb", "cc", "ll")
b2 = c("hh", "nn", "ff", "dd")
df2 = data.frame(n2, s2, b2)
# n2 s2 b2
#1 5 aa hh
#2 6 bb nn
#3 7 cc ff
#4 6 ll dd
我想将它们合并以获得以下结果:
I want to merge them to achieve the following result:
#n s b n2 s2 b2
#2 aa 2 5 aa hh
#3 bb 4 6 bb nn
#5 cc 5 7 cc ff
#5 dd 4 6 ll dd
基本上,我想要实现的是只要在data2的s2或b2列中找到第一个数据s中的值,就将两个数据帧合并.
Basically, what I want to achieve is to merge the two dataframes whenever the values in s of the first data is found in either the s2 or the b2 columns of data2.
我知道当我从每个数据帧中指定两列时合并可以工作,但是我不确定如何在合并函数中添加OR条件.或如何使用dpylr等软件包中的其他命令来实现此目标.
I know that merge can work when I specify the two columns from each dataframe but I am not sure how to ADD the OR condition in the merge function. Or how to achieve this goal using other commands from packages such as dpylr.
另外,为了澄清起见,还会出现s2和b2与s列在同一行中匹配的情况.如果是这种情况,则只需将它们合并一次.
Also, to clarify, there will be a situation where s2 and b2 have matches with s column in the same row. If this is the case, then just merge them once.
推荐答案
问题的结合:1)您已经建立了几个数据帧,这些数据帧的因素有可能使匹配和索引变糟,因此我使用stringsAsFactors = FALSE在数据帧调用中. 2)当s2和b2在s列中都具有匹配项时(如您的示例中所示),您处于模棱两可的情况,没有明确的解决方案:
A coupld of problems: 1) you have built a couple of dataframes with factors which has a tendency to screw up matching and indexing, so I used stringsAsFactors =FALSE in hte dataframe calls. 2) you have an ambiguous situation with no stated resolution when both s2 and b2 have matches in the s column (as does occur in your example):
> df2[c("s")] <- list( c( df$s[pmax( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s
1 5 aa hh aa
2 6 bb nn bb
3 7 cc ff ff
4 6 ll dd dd
> df2[c("s")] <- list( c( df$s[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s
1 5 aa hh aa
2 6 bb nn bb
3 7 cc ff cc
4 6 ll dd dd
一旦解决了对满意度的歧义,只需使用相同的方法提取并匹配"b"即可:
Once you resolve the ambiguity to your satiusfaction just use the same method to extract and match the "b"s:
> df2[c("b")] <- list( c( df$b[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s b
1 5 aa hh aa 2
2 6 bb nn bb 4
3 7 cc ff cc 5
4 6 ll dd dd 4
修改后的df:
> dput(df)
structure(list(n = c(2, 3, 5, 5, 6, 7), s = c("aa", "bb", "cc",
"dd", "ee", "ff"), b = c(2, 4, 5, 4, 3, 2)), .Names = c("n",
"s", "b"), row.names = c(NA, -6L), class = "data.frame")
> dput(df2)
structure(list(n2 = c(5, 6, 7, 6), s2 = c("aa", "bb", "cc", "ll"
), b2 = c("hh", "nn", "ff", "dd"), s = c("aa", "bb", "cc", "dd"
), b = c(2, 4, 5, 4)), row.names = c(NA, -4L), .Names = c("n2",
"s2", "b2", "s", "b"), class = "data.frame")
一步解决方案:
> df2[c("s", "c")] <- df[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE), c("s", "b")]
> df2
n2 s2 b2 s c
1 5 aa hh aa 2
2 6 bb nn bb 4
3 7 cc ff cc 5
4 6 ll dd dd 4
这篇关于R:当两个条件之一匹配时,合并两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!