按ID和重叠日期范围加入数据框 [英] Join dataframes by id and overlapping date range
问题描述
id.x< - c(1,2,4,5,7,8,10)
date.x< - as.Date(c(2015-01-01,2015-01-02,2015-01-21,2015-01-13,2015-01-29 ,2015-01-01,2015-01-03),format =%Y-%m-%d)
x< - data.frame(id.x,date.x )
id.y< - c(1,2,3,6,7,8,9)
date.y< - as.Date(c(2015-01-03 ,2015-01-29,2015-01-22,2015-01-13,2015-01-29,2014-12-31,2015-01-03), format =%Y-%m-%d)
y< - data.frame(id.y,date.y)
我想通过匹配id和wether date.y发生在date.x + 3天内,例如个人1在date.y =2015-01-03发生事件y,在事件x的3天内,date.x =2015-01-01。
您可以创建一个ifelse语句,创建一个等于date.x的向量,如果date.y< = date .x + 3和date.y> = date.x,等于date.y否则。然后根据这个向量合并两个:
id.x< - c(1,2,4,5,7 ,8,10)
date.x< - as.Date(c(2015-01-01,2015-01-02,2015-01-21,2015-01- 13,2015-01-29,2015-01-01,2015-01-03),format =%Y-%m-%d)
x < - cbind。 data.frame(id.x,date.x)
id.y< - c(1,2,3,6,7,8,9)
date.y< - as。日期(c(2015-01-03,2015-01-29,2015-01-22,2015-01-13,2015-01-29,2014-12-31 ,2015-01-03),format =%Y-%m-%d)
y< - cbind.data.frame(id.y,date.y)
safe.ifelse< - function(cond,yes,no)结构(ifelse(cond,yes,no),class = class(yes))
match< - safe.ifelse (date.y< = date.x + 3& date.y> = date.x,
match< - date.x,
match< - date.y)
y $ date.x< - match
names(y)[1]< - id.x
dplyr :: left_join(x,y, by = c(id.x,date.x))
id.x date.x date.y
1 1 2015-01-01 2015-01-03
2 2 2015- 01-02< NA>
3 4 2015-01-21< NA>
4 5 2015-01-13< NA>
5 7 2015-01-29 2015-01-29
6 8 2015-01-01< NA>
7 10 2015-01-03< NA>
我从这个 safe.ifelse 函数http://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects\">邮箱,因为基本ifelse语句会产生数字向量而不是日期向量。
I have two dataframes x and y that contain columns for ids and for dates.
id.x <- c(1, 2, 4, 5, 7, 8, 10)
date.x <- as.Date(c("2015-01-01", "2015-01-02", "2015-01-21", "2015-01-13", "2015-01-29", "2015-01-01", "2015-01-03"),format = "%Y-%m-%d")
x <- data.frame(id.x, date.x)
id.y <- c(1, 2, 3, 6, 7, 8, 9)
date.y <- as.Date(c("2015-01-03", "2015-01-29", "2015-01-22", "2015-01-13", "2015-01-29", "2014-12-31", "2015-01-03"), format = "%Y-%m-%d")
y <- data.frame(id.y, date.y)
I would like to join them into a new dataframe z by matching id and wether date.y occurs within date.x + 3 days, e.g. individual "1" had event "y" occur on date.y = "2015-01-03" which is within 3 days of event x on date.x = "2015-01-01".
You can create an ifelse statement that creates a vector that is equal to date.x if date.y <= date.x + 3 and date.y >= date.x and equal to date.y otherwise. Then merge the two based on this vector:
id.x <- c(1, 2, 4, 5, 7, 8, 10)
date.x <- as.Date(c("2015-01-01", "2015-01-02", "2015-01-21", "2015-01-13", "2015-01-29", "2015-01-01", "2015-01-03"),format = "%Y-%m-%d")
x <- cbind.data.frame(id.x, date.x)
id.y <- c(1, 2, 3, 6, 7, 8, 9)
date.y <- as.Date(c("2015-01-03", "2015-01-29", "2015-01-22", "2015-01-13", "2015-01-29", "2014-12-31", "2015-01-03"), format = "%Y-%m-%d")
y <- cbind.data.frame(id.y, date.y)
safe.ifelse <- function(cond, yes, no) structure(ifelse(cond, yes, no), class = class(yes))
match <- safe.ifelse(date.y <= date.x+3 & date.y >= date.x,
match <- date.x,
match <- date.y)
y$date.x <- match
names(y)[1] <- "id.x"
dplyr::left_join(x, y, by=c("id.x","date.x"))
id.x date.x date.y
1 1 2015-01-01 2015-01-03
2 2 2015-01-02 <NA>
3 4 2015-01-21 <NA>
4 5 2015-01-13 <NA>
5 7 2015-01-29 2015-01-29
6 8 2015-01-01 <NA>
7 10 2015-01-03 <NA>
I borrowed the safe.ifelse function from this post because the base ifelse statement results in a numeric vector rather than a date vector.
这篇关于按ID和重叠日期范围加入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!