考虑两个变量匹配R中的两个数据帧,并且不更改不匹配的行 [英] Match two data frames in R considering two variables and not change the rows not matched
问题描述
大家好,当我有两个要匹配的公共变量时,我在匹配R中的两个数据帧时会遇到一些问题.第一个数据帧是这样的:
Hi everybody I have a little problem matching two data frames in R when they have two common variables to be matched. The first data frame is like this:
Class Count V1 V2 V3
E 124 1 2 2
E 123 2 0 0
L 100 5 5 5
L 111 1 1 1
E 120 3 3 3
第二个数据帧具有以下格式:
Second data frame has this form:
Class Count Code
E 124 1241
L 111 1234
我想考虑考虑Class
和Count
变量进行匹配的新数据框.结果数据帧将如下所示:
I would like to have a new data frame considering Class
and Count
variables for the match. The resulting data frame would be like this:
Class Count V1 V2 V3
E 124 1241 2 2
E 123 2 0 0
L 100 5 5 5
L 111 1234 1 1
E 120 3 3 3
仅将匹配的元素替换为V1
变量中的Code
变量.其余元素相同,并且我的第一个数据帧中没有NA
和其他更改.我等待有可能在R中取得成就.预先感谢.
Where only the elements that have matched were replaced with Code
variable in V1
variable. The rest of elements are the same and I don't have NA
and other changes in my first data frame. I wait it is possible to make in R. Thanks in advance.
推荐答案
df1$V1<-ifelse((df1$Class==df2$Class & df1$Count==df2$Count),df2$Code,df1$V1)
df1
Class Count V1 V2 V3
1 E 124 1241 2 2
2 E 123 2 0 0
3 L 100 5 5 5
4 L 111 1234 1 1
5 E 120 3 3 3
已根据评论中提供的数据进行了更新:
您可以在两个数据中使用interaction
从c9和CC4创建一个交互变量(int),然后使用%in%
(似乎您不是在寻找行与行的匹配,因此应避免使用interaction
之前先处理c9
和CC4
中的NA,这是因为如果其中之一是NA,则int的值将是您不希望匹配的NA(在下面的示例中,我还没有处理过NA).
You can create an interaction variable (int) from c9 and CC4 using interaction
in both data and then use %in%
(it seems that you are not looking for row to row match, so you should avoid using ifelse
). I suggest you to deal with NA in c9
and CC4
before using interaction
.This is because if one of these is NA then value of int will be NA which you may not want for matching (In the following example, I haven't dealt with NA's).
df1$int<-interaction(df1$c9,df1$CC4) #z data is df1 and z1 data is df2
df2$int<-interaction(df2$c9,df2$CC4)
df1[df1$int %in% df2$int,5]<-df2[df2$int %in% df1$int,13] #this will replaces col5 of df1 with col13 of df2 if matches occurs otherwise the value of col5 of df1 will be same as before
输出:
> df1
c1 c2 c9 CC4 A.la.vista Montoxv_a120d Montoxv_a15d Montoxv_a186d Montoxv_a30d Montoxv_a60d Montoxv_a7d Montoxv_a90d int
1 20130830 192 E 111 39324363.19 0 0.0 0.0 0 0 1550000 0 E.111
2 20130830 192 E 124 71061061.04 0 0.0 69608583.8 1452477 0 0 0 E.124
3 20130830 192 E 131 0.00 0 182694.0 0.0 1027283 3308932 2010328 3809021 E.131
4 20130830 192 E 201 66310498.77 0 0.0 0.0 0 0 0 0 E.201
5 20130830 192 E 202 0.00 34403130 10275256.6 40375044.8 17999369 37156810 8953196 32639408 E.202
6 20130830 192 E 203 51885967.69 0 0.0 0.0 0 0 0 0 E.203
7 20130830 192 E 211 3537648.29 0 0.0 0.0 0 0 0 0 E.211
8 20130830 192 E NA NA 8181927 314120.5 10816365.6 3295626 11992733 3025800 4673335 <NA>
9 20130830 192 L 101 64013.84 0 0.0 0.0 0 0 0 0 L.101
10 20130830 192 L 111 5429375.87 5000000 0.0 0.0 11000000 8500000 7500000 9900000 L.111
11 20130830 192 L 121 8869286.40 0 0.0 7874386.4 0 994900 0 0 L.121
12 20130830 192 L 123 8805450.00 2200000 0.0 2005700.0 1299000 1300750 0 2000000 L.123
13 20130830 192 L 124 5408668.05 0 0.0 5408668.0 0 0 0 0 L.124
14 20130830 192 L 131 0.00 0 2539885.0 0.0 0 8498099 694912 3793809 L.131
15 20130830 192 L 141 18150400.00 0 0.0 15510400.0 1000000 150000 0 1490000 L.141
16 20130830 192 L 201 4545930.38 0 0.0 0.0 0 0 0 0 L.201
17 20130830 192 L 202 0.00 0 0.0 510609.7 0 1187226 0 95000 L.202
18 20130830 192 L 203 708863.95 0 0.0 0.0 0 0 0 0 L.203
要查看df1的哪一行匹配,请使用
To see which row of df1 matches use
> which(df1$int %in% df2$int)
[1] 2 6 11 12 13 15 18
这篇关于考虑两个变量匹配R中的两个数据帧,并且不更改不匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!