根据每个数据帧中匹配的两个可交换列合并两个数据帧 [英] merge two dataframe based on matching two exchangable columns in each dataframe

查看:88
本文介绍了根据每个数据帧中匹配的两个可交换列合并两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有两个数据框。



dataframe 1

  ABCDEFG 
1 2 aaaaa
2 3 bbbcc
4 1 eeffe

dataframe 2

  XYZ 
1 2 g
2 1 h
3 4 i我想要将dataframe1的列A和B与dataframe2的列X匹配,并且它不是成对的比较,即行1(A = 1 B = 2)被认为与行1(X = 1,Y = 2)和行2(X = 2,Y = 1)相同数据帧2。



当匹配可以找到时,我想将数据帧1的列C,D,E,F添加回数据帧2的匹配行,如下所示:没有匹配为na。



最终数据框

  XYZCDEFG 
1 2 gaaaaa
2 1 haaaaa
3 4我没有na na
1 4 jeeffe

我只能知道如何匹配单列,但是如何做对于两个可交换的列进行匹配,并且基于匹配结果合并两个数据框对我来说是困难的。为了方便讨论(感谢Vincent和DWin(我以前的问题)的评论,我应该测试这个。报价。)将数据帧1和2加载到R中的配额。

  df1 < -  data.frame(A = c(1,2,4),B = c(2,3,1),C = c('a','b','e'),
D = c('a' b','e'),E = c('a','b','f'),
F = c('a','c','f'),G = 'a','c','e'))

df2< - data.frame(X = c(1,2,3,1),Y = c(2,1, 4,4),Z =字母[7:10])


解决方案

以下工作,但无疑可以改进。



我首先创建一个小帮手功能,在A和B(并重命名)上执行逐行排序它到V1和V2)。

  replace_index<  -  function(dat){
x< - as.data .frame(t(sapply(seq_len(nrow(dat)),
函数(i)sort(unlist(dat [i,1:2])))))
names(x)< - paste(V,seq_len(ncol(x) sep =)
data.frame(x,dat [, - (1:2),drop = FALSE])
}

replace_index(df1)

V1 V2 CDEFG
1 1 2 aaaaa
2 2 3 bbbcc
3 1 4 eeffe

这意味着您可以使用直接合并来组合数据。

  merge(replace_index(df1),replace_index(df2),all.y = TRUE)

V1 V2 CDEFGZ
1 1 2 aaaaag
2 1 2 aaaaah
3 1 4 eeffej
4 3 4< NA> < NA> < NA> < NA> < NA> i


I have two dataframe in R.

dataframe 1

A B C D E F G
1 2 a a a a a
2 3 b b b c c
4 1 e e f f e

dataframe 2

X Y Z
1 2 g
2 1 h
3 4 i
1 4 j

I want to match dataframe1's column A and B with dataframe2's column X and Y. It is NOT a pairwise comparsions, i.e. row 1 (A=1 B=2) are considered to be same as row 1 (X=1, Y=2) and row 2 (X=2, Y=1) of dataframe 2.

When matching can be found, I would like to add columns C, D, E, F of dataframe1 back to the matched row of dataframe2, as follows: with no matching as na.

Final dataframe

X Y Z C  D  E  F  G
1 2 g a  a  a  a  a 
2 1 h a  a  a  a  a
3 4 i na na na na na
1 4 j e  e  f  f  e

I can only know how to do matching for single column, however, how to do matching for two exchangable columns and merging two dataframes based on the matching results is difficult for me. Pls kindly help to offer smart way of doing this.

For the ease of discussion (thanks for the comments by Vincent and DWin (my previous quesiton) that I should test the quote.) There are the quota for loading dataframe 1 and 2 to R.

df1 <- data.frame(A = c(1,2,4), B=c(2,3,1), C=c('a','b','e'), 
                                D=c('a','b','e'), E=c('a','b','f'), 
                                F=c('a','c','f'), G=c('a','c', 'e'))

df2  <- data.frame(X = c(1,2,3,1), Y=c(2,1,4,4), Z=letters[7:10])

解决方案

The following works, but no doubt can be improved.

I first create a little helper function that performs a row-wise sort on A and B (and renames it to V1 and V2).

replace_index <- function(dat){
  x <- as.data.frame(t(sapply(seq_len(nrow(dat)), 
    function(i)sort(unlist(dat[i, 1:2])))))
  names(x) <- paste("V", seq_len(ncol(x)), sep="")
  data.frame(x, dat[, -(1:2), drop=FALSE])
} 

replace_index(df1)

  V1 V2 C D E F G
1  1  2 a a a a a
2  2  3 b b b c c
3  1  4 e e f f e

This means you can use a straight-forward merge to combine the data.

merge(replace_index(df1), replace_index(df2), all.y=TRUE)

  V1 V2    C    D    E    F    G Z
1  1  2    a    a    a    a    a g
2  1  2    a    a    a    a    a h
3  1  4    e    e    f    f    e j
4  3  4 <NA> <NA> <NA> <NA> <NA> i

这篇关于根据每个数据帧中匹配的两个可交换列合并两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆