R:当两个条件之一匹配时,合并两个数据帧 [英] R: merge two data frames when either of two criteria matches

查看:172
本文介绍了R:当两个条件之一匹配时,合并两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有两个数据帧,如下所示:

Say I have two dataframes like the following:

n = c(2, 3, 5, 5, 6, 7) 
s = c("aa", "bb", "cc", "dd", "ee", "ff") 
b = c(2, 4, 5, 4, 3, 2) 
df = data.frame(n, s, b)
#  n  s b
#1 2 aa 2
#2 3 bb 4
#3 5 cc 5  
#4 5 dd 4
#5 6 ee 3
#6 7 ff 2

n2 = c(5, 6, 7, 6) 
s2 = c("aa", "bb", "cc", "ll") 
b2 = c("hh", "nn", "ff", "dd")  
df2 = data.frame(n2, s2, b2)

 #   n2 s2 b2
 #1  5 aa hh
 #2  6 bb nn
 #3  7 cc ff
 #4  6 ll dd

我想将它们合并以获得以下结果:

I want to merge them to achieve the following result:

 #n s  b n2 s2 b2
 #2 aa 2 5  aa hh
 #3 bb 4 6  bb nn
 #5 cc 5 7  cc ff
 #5 dd 4 6  ll dd

基本上,我想要实现的是只要在data2的s2或b2列中找到第一个数据s中的值,就将两个数据帧合并.

Basically, what I want to achieve is to merge the two dataframes whenever the values in s of the first data is found in either the s2 or the b2 columns of data2.

我知道当我从每个数据帧中指定两列时合并可以工作,但是我不确定如何在合并函数中添加OR条件.或如何使用dpylr等软件包中的其他命令来实现此目标.

I know that merge can work when I specify the two columns from each dataframe but I am not sure how to ADD the OR condition in the merge function. Or how to achieve this goal using other commands from packages such as dpylr.

另外,为了澄清起见,还会出现s2和b2与s列在同一行中匹配的情况.如果是这种情况,则只需将它们合并一次.

Also, to clarify, there will be a situation where s2 and b2 have matches with s column in the same row. If this is the case, then just merge them once.

推荐答案

问题的结合:1)您已经建立了几个数据帧,这些数据帧的因素有可能使匹配和索引变糟,因此我使用stringsAsFactors = FALSE在数据帧调用中. 2)当s2和b2在s列中都具有匹配项时(如您的示例中所示),您处于模棱两可的情况,没有明确的解决方案:

A coupld of problems: 1) you have built a couple of dataframes with factors which has a tendency to screw up matching and indexing, so I used stringsAsFactors =FALSE in hte dataframe calls. 2) you have an ambiguous situation with no stated resolution when both s2 and b2 have matches in the s column (as does occur in your example):

> df2[c("s")] <- list( c( df$s[pmax( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
  n2 s2 b2  s
1  5 aa hh aa
2  6 bb nn bb
3  7 cc ff ff
4  6 ll dd dd
> df2[c("s")] <- list( c( df$s[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
  n2 s2 b2  s
1  5 aa hh aa
2  6 bb nn bb
3  7 cc ff cc
4  6 ll dd dd

一旦解决了对满意度的歧义,只需使用相同的方法提取并匹配"b"即可:

Once you resolve the ambiguity to your satiusfaction just use the same method to extract and match the "b"s:

> df2[c("b")] <- list( c( df$b[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
  n2 s2 b2  s b
1  5 aa hh aa 2
2  6 bb nn bb 4
3  7 cc ff cc 5
4  6 ll dd dd 4

修改后的df:

> dput(df)
structure(list(n = c(2, 3, 5, 5, 6, 7), s = c("aa", "bb", "cc", 
"dd", "ee", "ff"), b = c(2, 4, 5, 4, 3, 2)), .Names = c("n", 
"s", "b"), row.names = c(NA, -6L), class = "data.frame")
> dput(df2)
structure(list(n2 = c(5, 6, 7, 6), s2 = c("aa", "bb", "cc", "ll"
), b2 = c("hh", "nn", "ff", "dd"), s = c("aa", "bb", "cc", "dd"
), b = c(2, 4, 5, 4)), row.names = c(NA, -4L), .Names = c("n2", 
"s2", "b2", "s", "b"), class = "data.frame")

一步解决方案:

> df2[c("s", "c")] <-  df[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE), c("s", "b")]
> df2
  n2 s2 b2  s c
1  5 aa hh aa 2
2  6 bb nn bb 4
3  7 cc ff cc 5
4  6 ll dd dd 4

这篇关于R:当两个条件之一匹配时,合并两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆