R:基于多个条件合并(具有不相等的条件) [英] R: merge based on multiple conditions (with non-equal criteria)

查看:44
本文介绍了R:基于多个条件合并(具有不相等的条件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据多个条件合并 2 个数据框.

I would like to merge 2 data frames based on multiple conditions.

DF1 <- data.frame("col1" = rep(c("A","B"), 18),
                  "col2" = rep(c("C","D","E"), 12),
                  "value"= (sample(1:100,36)),
                  "col4" = rep(NA,36))

DF2 <- data.frame("col1" = rep("A",6),
                  "col2" = rep(c("C","D"),3),
                  "data" = rep(c(1,3),3),
                  "min" = seq(0,59,by=10),
                  "max" = seq(10,69,by=10))


> DF1
   col1 col2 value col4
1     A    C    22   NA
2     B    D    58   NA
3     A    E    35   NA
4     B    C    86   NA
5     A    D    37   NA
6     B    E    16   NA
7     A    C    46   NA
8     B    D    23   NA
9     A    E    88   NA
10    B    C     3   NA
11    A    D    33   NA
12    B    E    25   NA
13    A    C    19   NA
14    B    D    24   NA
15    A    E     9   NA
16    B    C    76   NA
17    A    D    62   NA
18    B    E    68   NA
19    A    C    97   NA
20    B    D    43   NA
21    A    E     8   NA
22    B    C    84   NA
23    A    D    36   NA
24    B    E    20   NA
25    A    C    57   NA
26    B    D    99   NA
27    A    E    42   NA
28    B    C    64   NA
29    A    D    87   NA
30    B    E     1   NA
31    A    C    78   NA
32    B    D    34   NA
33    A    E    41   NA
34    B    C    32   NA
35    A    D    10   NA
36    B    E    72   NA

> DF2
  col1 col2 data min max
1    A    C    1   0  10
2    A    D    3  10  20
3    A    C    1  20  30
4    A    D    3  30  40
5    A    C    1  40  50
6    A    D    3  50  60

DF1 是主表,DF2 被视为查找表

DF1 is the main table and DF2 is treated as a lookup table

如果 DF1 的 col1 和 col2 与 DF2 匹配,并且 DF1 的值"介于 DF2 的最小值和最大值之间,那么来自 DF2 的列数据"将被添加到 DF1.如果条件不满足,DF1的'data'的值为NA.

If col1 and col2 of DF1 match that of DF2, and 'value' of DF1 is in between min and max of DF2, then column 'data' from DF2 will be added to DF1. If the conditions are not met, 'data' of DF1 will have value of NA.

预期输出(前 6 行):

Expected output (first 6 rows):

  col1 col2 value col4 data
1    A    C    22   NA    1
2    B    D    58   NA   NA
3    A    E    35   NA   NA
4    B    C    86   NA   NA
5    A    D    37   NA    3
6    B    E    16   NA   NA

我尝试使用合并(匹配 col1 snd col2)然后使用子集(仅过滤具有最小值和最大值之间的值的行),但我的目标是维护 DF1 的所有行.

I've tried using merge (to match col1 snd col2) then subset (to filter only rows that have value in between min and max) , but my goal is to maintain all the rows of DF1.

有人对此有想法吗?

推荐答案

你的数据,正在改变 stringsAsFactors=F

DF1 <- data.frame("col1" = rep(c("A","B"), 18),
              "col2" = rep(c("C","D","E"), 12),
              "value"= (sample(1:100,36)),
              "col4" = rep(NA,36),
              stringsAsFactors=F)

DF2 <- data.frame("col1" = rep("A",6),
              "col2" = rep(c("C","D"),3),
              "data" = rep(c(1,3),3),
              "min" = seq(0,59,by=10),
              "max" = seq(10,69,by=10),
              stringsAsFactors=F)

使用dplyr1)使用left_join合并两个数据,2)检查ifelse value between minmax rowwise,然后 3) 取消选择 minmax 列...

Using dplyr, 1) merge the two data using left_join, 2) check ifelse value is between min and max rowwise, then 3) unselect min and max columns...

library(dplyr)
left_join(DF1, DF2, by=c("col1","col2")) %>%
  rowwise() %>%
  mutate(data = ifelse(between(value,min,max), data, NA)) %>%
  select(-min, -max)

不确定您是否希望执行某种聚合,但这是上述代码的输出

Not sure if you were expecting to perform some kind of aggregation, but here's the output of the above code

    col1  col2 value  col4  data
 1     A     C    23    NA    NA
 2     A     C    23    NA     1
 3     A     C    23    NA    NA
 4     B     D    59    NA    NA
 5     A     E    57    NA    NA
 6     B     C     8    NA    NA

这篇关于R:基于多个条件合并(具有不相等的条件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆