R-在多个条件下与辅助数据帧比较和测试主数据帧,如果为true,则从辅助数据帧分配给定值 [英] R - Compare and test primary data frame on multiple conditions with secondary data frame and if true assign a given value from secondary data frame

查看:66
本文介绍了R-在多个条件下与辅助数据帧比较和测试主数据帧,如果为true,则从辅助数据帧分配给定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,下面的代码将提供我所拥有的数据的示例:

  df< -data.frame( ID = c(325,464,464,464,464,464,464,464,464,464,512,512,687,701,869),
DATE = c( 2012-11-05, 2014-04-04, 2014-04-05, 2014-04-06, 2015-01-25,
2015-01 -25, 2015-01-26, 2015-01-26, 2015-01-26, 2015-01-27,
2014-10-13, 2014 -10-14, 2014-12-12, 2015-02-17, 2015-06-25))

df2<-data.frame( ID = c(325,464,464,512,701,869,922,954,989),
DATE_1 = c( 2012-11-03, 2014-04-01, 2015 -01-20, 2014-10-10, 2015-02-14,
2015-06-20, 2015-07-07, 2015-09-11, 2015-11-23),

DATE_2 = c( 2012-11-08, 2014-04-10, 2015-01-29, 2014- 10-19, 2015-02-22,
2015-06-29, 2015-07-13, 2015-09-25, 2015-1 1-29),

INTERVAL = c( 2012-11-03--2012-08-11 UTC, 2014-04-01--2014-04-10 UTC ,
2015-01-20--2015-01-29 UTC, 2014-10-10--2014-10-19 UTC,
2015-02-14-- 2014-02-22 UTC, 2015-06-20--2015-06-29 UTC,
2015-07-07--2015-07-13 UTC, 2015-09-11 --2015-09-25 UTC,
2015-11-23--2015-11-29 UTC),
KEY = c(6,8,92,233,642 ,1233,2464,3436,4366))

应产生以下收益:

  df 
ID日期
1325 2012-11-05
2464 2014-04-04
3464 2014-04-05
4464 2014-04-06
5464 2015-01-25
6464 2015-01-25
7464 2015-01- 26
8464 2015-01-26
9464 2015-01-26
10464 2015-01-27
11512 2014-10-13
12 512 2014-10-14
13687 2014-12-12
14701 2015-02-17
15869 2015-06-25



df2
ID DATE_1 DATE_2间隔键
1 325 2012-11-03 2012-11-08 2012-11-03--2012-08-11 UTC 6
2464 2014-04-01 2014-04-10 2014-04-01--2014-04-10 UTC 8
3464 2015-01-20 2015-01-29 2015-01-20- -2015-01-29 UTC 92
4512 2014-10-10 2014-10-19 2014-10-10--2014-10-19 UTC 233
5701 2015-02-14 2015 -02-22 2015-02-14--2014-02-22 UTC 642
6869 2015-06-20 2015-06-29 2015-06-20--2015-06-29 UTC 1233
7922 2015-07-07 2015-07-13 2015-07-07--2015-07-13 UTC 2464
8954 2015-09-11 2015-09-25 2015-09-11- -2015-09-25 UTC 3436
9989 2015-11-23 2015-11-29 2015-11-23--2015-11-29 UTC 4366

间隔列是使用lubridate的interval函数创建的,其中dplyr的mutate函数中嵌入了参数 DATE_1和 DATE_2。



我的问题是,如果ID与我的辅助data.frame(df2)中的ID相匹配,我需要测试主data.frame(df)中的行如果df中的日期落在df2中的间隔内。



我希望R在df中创建一个名为 NEW_KEY的新列,如果df中的一行在两种情况下都测试为true,则R应该在KEY中取值df2中适合ID和时间间隔的行中的列,并将其插入df中的NEW_KEY列。如果行在两种情况下都测试为假,则R应在NA_NEW列下的行中插入NA。条件,并且应该在每行中填充相同的KEY值,并且df还包含ID,该ID在df2中没有对应的行,并且其NEW_KEY应该用NA填充。



尝试了以下两个简单代码

  df $ NEW_KEY<-ifelse(ifelse(ifelse(df $ ID%in%df2 $ ID& ; df $ DATE%within%df2 $ INTERVAL,df2 $ KEY,NA)

  df<-mutate(df,NEW_VALUE = ifelse(df $ ID%in%df2 $ ID& df $ DATE%within% df2 $ INTERVAL,df2 $ KEY,NA))

两者都会产生相同的错误消息:

  1:I as.numeric(a)-as.numeric(int @ start):较长的对象长度不是较短的整数倍对象长度
2:I as.numeric(a)-as.numeric(int @ s tart)< = int @ .Data:较长的对象长度不是较短对象长度的倍数
3:I as.numeric(a)-as.numeric(int @ start):较长的对象长度不是a较短的对象长度的倍数

似乎只填充了df中的某些行,其余的则不行。 / p>

因此,我目前无法做些什么。任何帮助将不胜感激。



编辑: ID 464在df2中有两个不同的时间间隔(2014年和2015年),在df中有相应的日期落在df2的一个或另一个间隔内。当我进行合并时,R似乎没有区别,因此某些行将是两者的混合-方式是df中的某些行在DATE列下具有2014,而Interval列(从df2合并)是从2015年开始,而不是从2014年开始。

解决方案

此帮助吗?




库(magrittr)
库(dplyr)
库(lubridate)
df<-merge(df,df2,by = ID ,all.x = TRUE)
df<-df%&%;%mutate(DATE = as.Date(DATE),DATE_1 = as.Date(DATE_1),DATE_2 = as.Date(DATE_2))
df%>%mutate(NEW_VALUE = ifelse(DATE%within%(DATE_1%-%DATE_2),KEY,NA))


I have two data frames which the following code will provide an example of the data that I have:

df <- data.frame("ID" = c(325, 464, 464, 464, 464, 464, 464, 464, 464, 464, 512, 512, 687, 701, 869), 
             "DATE" = c("2012-11-05", "2014-04-04", "2014-04-05", "2014-04-06", "2015-01-25", 
                        "2015-01-25", "2015-01-26", "2015-01-26", "2015-01-26", "2015-01-27", 
                        "2014-10-13", "2014-10-14", "2014-12-12", "2015-02-17", "2015-06-25"))

df2 <- data.frame("ID" = c(325, 464, 464, 512, 701, 869, 922, 954, 989),
              "DATE_1" = c("2012-11-03", "2014-04-01", "2015-01-20", "2014-10-10", "2015-02-14",
                           "2015-06-20", "2015-07-07", "2015-09-11", "2015-11-23"),

              "DATE_2" = c("2012-11-08", "2014-04-10", "2015-01-29", "2014-10-19", "2015-02-22",
                           "2015-06-29", "2015-07-13", "2015-09-25", "2015-11-29"),

              "INTERVAL" = c("2012-11-03--2012-08-11 UTC", "2014-04-01--2014-04-10 UTC",
                             "2015-01-20--2015-01-29 UTC", "2014-10-10--2014-10-19 UTC", 
                             "2015-02-14--2014-02-22 UTC", "2015-06-20--2015-06-29 UTC", 
                             "2015-07-07--2015-07-13 UTC", "2015-09-11--2015-09-25 UTC", 
                             "2015-11-23--2015-11-29 UTC"),
              "KEY" = c(6, 8, 92, 233, 642, 1233, 2464, 3436, 4366))

Which should yield:

    df
    ID       DATE
1  325 2012-11-05
2  464 2014-04-04
3  464 2014-04-05
4  464 2014-04-06
5  464 2015-01-25
6  464 2015-01-25
7  464 2015-01-26
8  464 2015-01-26
9  464 2015-01-26
10 464 2015-01-27
11 512 2014-10-13
12 512 2014-10-14
13 687 2014-12-12
14 701 2015-02-17
15 869 2015-06-25



   df2
   ID     DATE_1     DATE_2                   INTERVAL  KEY
1 325 2012-11-03 2012-11-08 2012-11-03--2012-08-11 UTC    6
2 464 2014-04-01 2014-04-10 2014-04-01--2014-04-10 UTC    8
3 464 2015-01-20 2015-01-29 2015-01-20--2015-01-29 UTC   92
4 512 2014-10-10 2014-10-19 2014-10-10--2014-10-19 UTC  233
5 701 2015-02-14 2015-02-22 2015-02-14--2014-02-22 UTC  642
6 869 2015-06-20 2015-06-29 2015-06-20--2015-06-29 UTC 1233
7 922 2015-07-07 2015-07-13 2015-07-07--2015-07-13 UTC 2464
8 954 2015-09-11 2015-09-25 2015-09-11--2015-09-25 UTC 3436
9 989 2015-11-23 2015-11-29 2015-11-23--2015-11-29 UTC 4366

The interval column was created with lubridate's interval function with "DATE_1" and "DATE_2" as arguments embedded in dplyr's mutate function.

My problem is that I need to test the rows in my primary data.frame (df) if the ID matches the ID in my secondary data.frame (df2) and if the date in df falls within the interval in df2.

I would like R to create a new column in df called "NEW_KEY" and if a row in df tests true under both conditions then R should take the value in the KEY column in df2 from the row in which fit on ID and interval and insert it into the NEW_KEY column in df. If the row tests false under both conditions then R should insert NA in the row under the NEW_KEY column.

Do mind that df contains multiple duplicates which all should tests true under both conditions and should have the same KEY value filled in each row and df also contains ID which has no corresponding row in df2 and should have its NEW_KEY filled with NA.

I have attempted the following two simplistic code

df$NEW_KEY <- ifelse(ifelse(df$ID %in% df2$ID & df$DATE %within% df2$INTERVAL, df2$KEY, NA)

and

df <- mutate(df, NEW_VALUE = ifelse(df$ID %in% df2$ID & df$DATE %within% df2$INTERVAL, df2$KEY, NA))

both which yield the same error message:

1: I as.numeric(a) - as.numeric(int@start) : longer object length is not a multiple of shorter object length
2: I as.numeric(a) - as.numeric(int@start) <= int@.Data : longer object length is not a multiple of shorter object length
3: I as.numeric(a) - as.numeric(int@start) : longer object length is not a multiple of shorter object length

and only seems to fill out some rows in df and NA the rest.

So I'm currently at a halt on what to do. Any help would be greatly appreciated.

EDIT: ID 464 has two different intervals (in 2014 and 2015) in df2 with corresponding dates in df which falls either within one or the other interval in df2. When I do a merge, R does not seem to distinguish so some rows will be a mix of the two - in the way that some rows in df will have 2014 under the DATE column, but the Interval column (merged from df2) is the interval from 2015 instead of 2014.

解决方案

does this help ?

library(magrittr) library(dplyr) library(lubridate) df <- merge(df, df2, by = "ID", all.x = TRUE) df <- df %>% mutate(DATE = as.Date(DATE),DATE_1 = as.Date(DATE_1), DATE_2 = as.Date(DATE_2) ) df %>% mutate(NEW_VALUE = ifelse(DATE %within% (DATE_1 %--% DATE_2) , KEY, NA))

这篇关于R-在多个条件下与辅助数据帧比较和测试主数据帧,如果为true,则从辅助数据帧分配给定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆