R:基于多个条件合并(具有不相等的条件) [英] R: merge based on multiple conditions (with non-equal criteria)
问题描述
我想根据多个条件合并 2 个数据框.
I would like to merge 2 data frames based on multiple conditions.
DF1 <- data.frame("col1" = rep(c("A","B"), 18),
"col2" = rep(c("C","D","E"), 12),
"value"= (sample(1:100,36)),
"col4" = rep(NA,36))
DF2 <- data.frame("col1" = rep("A",6),
"col2" = rep(c("C","D"),3),
"data" = rep(c(1,3),3),
"min" = seq(0,59,by=10),
"max" = seq(10,69,by=10))
> DF1
col1 col2 value col4
1 A C 22 NA
2 B D 58 NA
3 A E 35 NA
4 B C 86 NA
5 A D 37 NA
6 B E 16 NA
7 A C 46 NA
8 B D 23 NA
9 A E 88 NA
10 B C 3 NA
11 A D 33 NA
12 B E 25 NA
13 A C 19 NA
14 B D 24 NA
15 A E 9 NA
16 B C 76 NA
17 A D 62 NA
18 B E 68 NA
19 A C 97 NA
20 B D 43 NA
21 A E 8 NA
22 B C 84 NA
23 A D 36 NA
24 B E 20 NA
25 A C 57 NA
26 B D 99 NA
27 A E 42 NA
28 B C 64 NA
29 A D 87 NA
30 B E 1 NA
31 A C 78 NA
32 B D 34 NA
33 A E 41 NA
34 B C 32 NA
35 A D 10 NA
36 B E 72 NA
> DF2
col1 col2 data min max
1 A C 1 0 10
2 A D 3 10 20
3 A C 1 20 30
4 A D 3 30 40
5 A C 1 40 50
6 A D 3 50 60
DF1 是主表,DF2 被视为查找表
DF1 is the main table and DF2 is treated as a lookup table
如果 DF1 的 col1 和 col2 与 DF2 匹配,并且 DF1 的值"介于 DF2 的最小值和最大值之间,那么来自 DF2 的列数据"将被添加到 DF1.如果条件不满足,DF1的'data'的值为NA.
If col1 and col2 of DF1 match that of DF2, and 'value' of DF1 is in between min and max of DF2, then column 'data' from DF2 will be added to DF1. If the conditions are not met, 'data' of DF1 will have value of NA.
预期输出(前 6 行):
Expected output (first 6 rows):
col1 col2 value col4 data
1 A C 22 NA 1
2 B D 58 NA NA
3 A E 35 NA NA
4 B C 86 NA NA
5 A D 37 NA 3
6 B E 16 NA NA
我尝试使用合并(匹配 col1 snd col2)然后使用子集(仅过滤具有最小值和最大值之间的值的行),但我的目标是维护 DF1 的所有行.
I've tried using merge (to match col1 snd col2) then subset (to filter only rows that have value in between min and max) , but my goal is to maintain all the rows of DF1.
有人对此有想法吗?
推荐答案
你的数据,正在改变 stringsAsFactors=F
DF1 <- data.frame("col1" = rep(c("A","B"), 18),
"col2" = rep(c("C","D","E"), 12),
"value"= (sample(1:100,36)),
"col4" = rep(NA,36),
stringsAsFactors=F)
DF2 <- data.frame("col1" = rep("A",6),
"col2" = rep(c("C","D"),3),
"data" = rep(c(1,3),3),
"min" = seq(0,59,by=10),
"max" = seq(10,69,by=10),
stringsAsFactors=F)
使用dplyr
,1)使用left_join
合并两个数据,2)检查ifelse
value
是 between
min
和 max
rowwise
,然后 3) 取消选择 min
和 max
列...
Using dplyr
, 1) merge the two data using left_join
, 2) check ifelse
value
is between
min
and max
rowwise
, then 3) unselect min
and max
columns...
library(dplyr)
left_join(DF1, DF2, by=c("col1","col2")) %>%
rowwise() %>%
mutate(data = ifelse(between(value,min,max), data, NA)) %>%
select(-min, -max)
不确定您是否希望执行某种聚合,但这是上述代码的输出
Not sure if you were expecting to perform some kind of aggregation, but here's the output of the above code
col1 col2 value col4 data
1 A C 23 NA NA
2 A C 23 NA 1
3 A C 23 NA NA
4 B D 59 NA NA
5 A E 57 NA NA
6 B C 8 NA NA
这篇关于R:基于多个条件合并(具有不相等的条件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!