按ID和重叠日期范围加入数据框 [英] Join dataframes by id and overlapping date range

查看:94
本文介绍了按ID和重叠日期范围加入数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框x和y包含ids和日期的列。

  id.x<  -  c(1,2,4,5,7,8,10)
date.x< - as.Date(c(2015-01-01,2015-01-02,2015-01-21,2015-01-13,2015-01-29 ,2015-01-01,2015-01-03),format =%Y-%m-%d)
x< - data.frame(id.x,date.x )
id.y< - c(1,2,3,6,7,8,9)
date.y< - as.Date(c(2015-01-03 ,2015-01-29,2015-01-22,2015-01-13,2015-01-29,2014-12-31,2015-01-03), format =%Y-%m-%d)
y< - data.frame(id.y,date.y)

我想通过匹配id和wether date.y发生在date.x + 3天内,例如个人1在date.y =2015-01-03发生事件y,在事件x的3天内,date.x =2015-01-01。

解决方案

您可以创建一个ifelse语句,创建一个等于date.x的向量,如果date.y< = date .x + 3和date.y> = date.x,等于date.y否则。然后根据这个向量合并两个:

  id.x<  -  c(1,2,4,5,7 ,8,10)
date.x< - as.Date(c(2015-01-01,2015-01-02,2015-01-21,2015-01- 13,2015-01-29,2015-01-01,2015-01-03),format =%Y-%m-%d)
x < - cbind。 data.frame(id.x,date.x)
id.y< - c(1,2,3,6,7,8,9)
date.y< - as。日期(c(2015-01-03,2015-01-29,2015-01-22,2015-01-13,2015-01-29,2014-12-31 ,2015-01-03),format =%Y-%m-%d)
y< - cbind.data.frame(id.y,date.y)

safe.ifelse< - function(cond,yes,no)结构(ifelse(cond,yes,no),class = class(yes))

match< - safe.ifelse (date.y< = date.x + 3& date.y> = date.x,
match< - date.x,
match< - date.y)

y $ date.x< - match
names(y)[1]< - id.x

dplyr :: left_join(x,y, by = c(id.x,date.x))

id.x date.x date.y
1 1 2015-01-01 2015-01-03
2 2 2015- 01-02< NA>
3 4 2015-01-21< NA>
4 5 2015-01-13< NA>
5 7 2015-01-29 2015-01-29
6 8 2015-01-01< NA>
7 10 2015-01-03< NA>

我从这个 safe.ifelse 函数http://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects\">邮箱,因为基本ifelse语句会产生数字向量而不是日期向量。


I have two dataframes x and y that contain columns for ids and for dates.

id.x <- c(1, 2, 4, 5, 7, 8, 10)
date.x <- as.Date(c("2015-01-01", "2015-01-02", "2015-01-21", "2015-01-13", "2015-01-29", "2015-01-01", "2015-01-03"),format = "%Y-%m-%d")
x <- data.frame(id.x, date.x)
id.y <- c(1, 2, 3, 6, 7, 8, 9)
date.y <- as.Date(c("2015-01-03", "2015-01-29", "2015-01-22", "2015-01-13", "2015-01-29", "2014-12-31", "2015-01-03"), format = "%Y-%m-%d")
y <- data.frame(id.y, date.y)

I would like to join them into a new dataframe z by matching id and wether date.y occurs within date.x + 3 days, e.g. individual "1" had event "y" occur on date.y = "2015-01-03" which is within 3 days of event x on date.x = "2015-01-01".

解决方案

You can create an ifelse statement that creates a vector that is equal to date.x if date.y <= date.x + 3 and date.y >= date.x and equal to date.y otherwise. Then merge the two based on this vector:

id.x <- c(1, 2, 4, 5, 7, 8, 10)
date.x <- as.Date(c("2015-01-01", "2015-01-02", "2015-01-21", "2015-01-13", "2015-01-29", "2015-01-01", "2015-01-03"),format = "%Y-%m-%d")
x <- cbind.data.frame(id.x, date.x)
id.y <- c(1, 2, 3, 6, 7, 8, 9)
date.y <- as.Date(c("2015-01-03", "2015-01-29", "2015-01-22", "2015-01-13", "2015-01-29", "2014-12-31", "2015-01-03"), format = "%Y-%m-%d")
y <- cbind.data.frame(id.y, date.y)

safe.ifelse <- function(cond, yes, no) structure(ifelse(cond, yes, no), class = class(yes))

match <- safe.ifelse(date.y <= date.x+3 & date.y >= date.x, 
            match <- date.x,
            match <- date.y)

y$date.x <- match
names(y)[1] <- "id.x"

dplyr::left_join(x, y, by=c("id.x","date.x"))

  id.x     date.x     date.y
1    1 2015-01-01 2015-01-03
2    2 2015-01-02       <NA>
3    4 2015-01-21       <NA>
4    5 2015-01-13       <NA>
5    7 2015-01-29 2015-01-29
6    8 2015-01-01       <NA>
7   10 2015-01-03       <NA>

I borrowed the safe.ifelse function from this post because the base ifelse statement results in a numeric vector rather than a date vector.

这篇关于按ID和重叠日期范围加入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆