考虑两个变量匹配R中的两个数据帧,并且不更改不匹配的行 [英] Match two data frames in R considering two variables and not change the rows not matched

查看:97
本文介绍了考虑两个变量匹配R中的两个数据帧,并且不更改不匹配的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,当我有两个要匹配的公共变量时,我在匹配R中的两个数据帧时会遇到一些问题.第一个数据帧是这样的:

Hi everybody I have a little problem matching two data frames in R when they have two common variables to be matched. The first data frame is like this:

Class  Count  V1  V2 V3
E       124   1   2   2
E       123   2   0   0
L       100   5   5   5
L       111   1   1   1
E       120   3   3   3

第二个数据帧具有以下格式:

Second data frame has this form:

Class  Count Code
E       124  1241
L       111  1234 

我想考虑考虑ClassCount变量进行匹配的新数据框.结果数据帧将如下所示:

I would like to have a new data frame considering Class and Count variables for the match. The resulting data frame would be like this:

    Class  Count   V1    V2 V3
    E       124   1241   2   2
    E       123   2      0   0
    L       100   5      5   5
    L       111   1234   1   1
    E       120   3      3   3

仅将匹配的元素替换为V1变量中的Code变量.其余元素相同,并且我的第一个数据帧中没有NA和其他更改.我等待有可能在R中取得成就.预先感谢.

Where only the elements that have matched were replaced with Code variable in V1 variable. The rest of elements are the same and I don't have NA and other changes in my first data frame. I wait it is possible to make in R. Thanks in advance.

推荐答案

 df1$V1<-ifelse((df1$Class==df2$Class & df1$Count==df2$Count),df2$Code,df1$V1)
     df1
  Class Count   V1 V2 V3
1     E   124 1241  2  2
2     E   123    2  0  0
3     L   100    5  5  5
4     L   111 1234  1  1
5     E   120    3  3  3

已根据评论中提供的数据进行了更新:

您可以在两个数据中使用interaction从c9和CC4创建一个交互变量(int),然后使用%in%(似乎您不是在寻找行与行的匹配,因此应避免使用).我建议您在使用interaction之前先处理c9CC4中的NA,这是因为如果其中之一是NA,则int的值将是您不希望匹配的NA(在下面的示例中,我还没有处理过NA).

You can create an interaction variable (int) from c9 and CC4 using interaction in both data and then use %in% (it seems that you are not looking for row to row match, so you should avoid using ifelse). I suggest you to deal with NA in c9 and CC4 before using interaction .This is because if one of these is NA then value of int will be NA which you may not want for matching (In the following example, I haven't dealt with NA's).

df1$int<-interaction(df1$c9,df1$CC4) #z data is df1 and z1 data is df2
df2$int<-interaction(df2$c9,df2$CC4)
df1[df1$int %in% df2$int,5]<-df2[df2$int %in% df1$int,13] #this will replaces col5 of df1 with col13 of df2 if matches occurs otherwise the value of col5 of df1 will be same as before

输出:

       > df1
         c1  c2 c9 CC4  A.la.vista Montoxv_a120d Montoxv_a15d Montoxv_a186d Montoxv_a30d Montoxv_a60d Montoxv_a7d Montoxv_a90d   int
1  20130830 192  E 111 39324363.19             0          0.0           0.0            0            0     1550000            0 E.111
2  20130830 192  E 124 71061061.04             0          0.0    69608583.8      1452477            0           0            0 E.124
3  20130830 192  E 131        0.00             0     182694.0           0.0      1027283      3308932     2010328      3809021 E.131
4  20130830 192  E 201 66310498.77             0          0.0           0.0            0            0           0            0 E.201
5  20130830 192  E 202        0.00      34403130   10275256.6    40375044.8     17999369     37156810     8953196     32639408 E.202
6  20130830 192  E 203 51885967.69             0          0.0           0.0            0            0           0            0 E.203
7  20130830 192  E 211  3537648.29             0          0.0           0.0            0            0           0            0 E.211
8  20130830 192  E  NA          NA       8181927     314120.5    10816365.6      3295626     11992733     3025800      4673335  <NA>
9  20130830 192  L 101    64013.84             0          0.0           0.0            0            0           0            0 L.101
10 20130830 192  L 111  5429375.87       5000000          0.0           0.0     11000000      8500000     7500000      9900000 L.111
11 20130830 192  L 121  8869286.40             0          0.0     7874386.4            0       994900           0            0 L.121
12 20130830 192  L 123  8805450.00       2200000          0.0     2005700.0      1299000      1300750           0      2000000 L.123
13 20130830 192  L 124  5408668.05             0          0.0     5408668.0            0            0           0            0 L.124
14 20130830 192  L 131        0.00             0    2539885.0           0.0            0      8498099      694912      3793809 L.131
15 20130830 192  L 141 18150400.00             0          0.0    15510400.0      1000000       150000           0      1490000 L.141
16 20130830 192  L 201  4545930.38             0          0.0           0.0            0            0           0            0 L.201
17 20130830 192  L 202        0.00             0          0.0      510609.7            0      1187226           0        95000 L.202
18 20130830 192  L 203   708863.95             0          0.0           0.0            0            0           0            0 L.203

要查看df1的哪一行匹配,请使用

To see which row of df1 matches use

> which(df1$int %in% df2$int)
[1]  2  6 11 12 13 15 18

这篇关于考虑两个变量匹配R中的两个数据帧,并且不更改不匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆