如何使用多列作为键合并两个数据框? [英] How to merge two dataframes using multiple columns as key?
问题描述
说我有以下数据帧:
DF1 <- data.frame("A" = rep(c("A","B"), 18),
"B" = rep(c("C","D","E"), 12),
"NUM"= rep(rnorm(36,10,1)),
"TEST" = rep(NA,36))
DF2 <- data.frame("A" = rep("A",6),
"B" = rep(c("C","D"),6),
"VAL" = rep(c(1,3),3))
*注:变量 A
和 DF2
中的 B
应该具有唯一的 VAL
。
*Note: Each unique combination of variables A
and B
in DF2
should have a unique VAL
.
对于每一行,我想在 TEST $中替换
NA
c $ c>,如果 A列中的值为
和 DF1
中的 VAL
A
匹配,并且列 B
和 B中的值匹配
匹配该行。否则,我会将 TEST
保留为 NA
。
For each row, I would like to replace the NA
in TEST
with the corresponding value of VAL
in DF1
if the values in columns A
and A
match and the values in columns B
and B
match for that row. Otherwise, I'd leave TEST
as NA
. How would I do this without looping through each combination using match?
理想情况下,答案将缩放到两个数据帧,其中有许多列要匹配。
Ideally, an answer would scale to two data frames with many columns to match upon.
推荐答案
正如Akrun在评论中提到的那样,您的查找表(DF2)需要简化为唯一的A / B组合。对于您当前的数据框,这不是问题,但是如果同一组合有多个可能的值,则需要其他规则。从那里开始,解决方案很容易:
As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:
DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)
请注意,这将产生一个新的数据帧空的TEST列(所有值 NA
),以及从DF2分配的VAL列。要完全执行您想要的操作(在可能的情况下用VAL代替TEST),下面是一些较笨拙的代码:
Note that this will produce a new dataframe with an empty TEST column (all values NA
), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:
DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL
编辑:响应您的问题,可以根据需要将DF2简化:
in response to your question, you can boil down DF2 if necessary quite simple:
DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])
A B VAL
1 A C 1
2 A D 3
这篇关于如何使用多列作为键合并两个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!