R 查找时间段之间的重叠 [英] R Find overlap among time periods

查看:69
本文介绍了R 查找时间段之间的重叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过多次思考和谷歌搜索后,我找不到解决问题的方法,希望您能帮助我.

我有一个大型数据框,其中包含一个可以重复 2 次以上的 ID 列、一个构成时间段的开始和结束日期列.我想知道,按 ID 分组,该 ID 的任何时间段是否与另一个时间段重叠,如果是,则通过创建一个新列来标记它,例如,说明该 ID 是否重叠.

这是一个已经包含所需新列的示例数据框:

structure(list(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L,94L), 开始 = 结构 (c(1072911600, 1262300400, 1157061600,1277935200、1157061600、1277935200、1157061600、1075590000、1285891200), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000,1409436000、1251669600、1404079200、1251669600、1404079200、1251669600、1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""),重叠 = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE,FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap"), row.names = c(NA, -9L), class = "data.frame")ID 开始结束重叠34 2004-01-01 00:00:00 2009-12-31 00:00:00 假34 2010-01-01 00:00:00 2014-08-31 00:00:00 假80 2006-09-01 00:00:00 2009-08-31 00:00:00 假80 2010-07-01 00:00:00 2014-06-30 00:00:00 假81 2006-09-01 00:00:00 2009-08-31 00:00:00 真81 2010-07-01 00:00:00 2014-06-30 00:00:00 真81 2006-09-01 00:00:00 2009-08-31 00:00:00 真94 2004-02-01 00:00:00 2010-01-31 00:00:00 假94 2010-10-01 02:00:00 2016-09-30 02:00:00 错误

在这种情况下,对于 ID81",两个时间段之间存在重叠,因此我想将 ID = 81 的所有行标记为 TRUE,这意味着在该 ID 的至少两行中发现了重叠.这只是一个理想的解决方案,但总的来说,我想要做的就是在按 ID 分组时找出重叠部分,因此标记它的方式可以很灵活,以防万一.

在此先感谢您的帮助.

解决方案

我想这就是您要找的代码?让我知道.

data<-结构(列表(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L,94L), 开始 = 结构 (c(1072911600, 1262300400, 1157061600,1277935200、1157061600、1277935200、1157061600、1075590000、1285891200), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000,1409436000、1251669600、1404079200、1251669600、1404079200、1251669600、1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""),重叠 = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE,FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap"), row.names = c(NA, -9L), class = "data.frame")图书馆(dplyr")图书馆(润滑")重叠<-函数(间隔){for(i in 1:(length(intervals)-1)){for(j in (i+1):length(intervals)){if(int_overlaps(intervals[i],intervals[j])){返回(真)}}}返回(假)}数据%>%变异(间隔=间隔(开始,结束))%>%group_by(ID) %>%做({df<-.ovl<-重叠(df$Interval)返回(数据.框架(ID = df $ ID [1],ovl))})

另外,我希望有人为我的 overlaps 功能提出更优雅的解决方案..

after a lot fo thinking and googling I could not find the solution to my problem, I hope you can help me.

I have a large data frame with an ID column that can repeat more than 2 times, a start and and end date column that would make up a time period. I would like to find out, grouping by ID, if any of the time periods for that ID overlap with another one, and if so, flag it by creating a new column for example, saying if that ID has overlaps or not.

Here is an example data frame already with the desired new column:

structure(list(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L, 
94L), Start = structure(c(1072911600, 1262300400, 1157061600, 
1277935200, 1157061600, 1277935200, 1157061600, 1075590000, 1285891200
), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000, 
1409436000, 1251669600, 1404079200, 1251669600, 1404079200, 1251669600, 
1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""), 
    Overlap = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, 
    FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap"
), row.names = c(NA, -9L), class = "data.frame")


 ID               Start                 End Overlap
 34 2004-01-01 00:00:00 2009-12-31 00:00:00   FALSE
 34 2010-01-01 00:00:00 2014-08-31 00:00:00   FALSE
 80 2006-09-01 00:00:00 2009-08-31 00:00:00   FALSE
 80 2010-07-01 00:00:00 2014-06-30 00:00:00   FALSE
 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
 81 2010-07-01 00:00:00 2014-06-30 00:00:00    TRUE
 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
 94 2004-02-01 00:00:00 2010-01-31 00:00:00   FALSE
 94 2010-10-01 02:00:00 2016-09-30 02:00:00   FALSE

In this case, for ID "81" there is an overlap between two time periods, so I would like to flag all rows with ID = 81 as TRUE, meaning that an overlap in at least two rows of that ID was found. This is just a desired solution, but in general, all I want to do is find out the overlaps when grouping by ID, so the way of flagging it can be flexible, in case it simplifies things.

Thanks in advance for any help.

解决方案

I think this is the code that you are looking for? Let me know.

data<- structure(list(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L, 
                            94L), Start = structure(c(1072911600, 1262300400, 1157061600, 
                                                      1277935200, 1157061600, 1277935200, 1157061600, 1075590000, 1285891200
                            ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000, 
                                                                                              1409436000, 1251669600, 1404079200, 1251669600, 1404079200, 1251669600, 
                                                                                              1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""), 
                      Overlap = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, 
                                  FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap"
                                  ), row.names = c(NA, -9L), class = "data.frame")

library("dplyr")
library("lubridate")

overlaps<- function(intervals){
        for(i in 1:(length(intervals)-1)){
                for(j in (i+1):length(intervals)){
                        if(int_overlaps(intervals[i],intervals[j])){
                                return(TRUE)
                        }
                }
        }
        return(FALSE)
}

data %>%
        mutate(Interval=interval(Start,End))%>%
        group_by(ID) %>% 
       do({
               df<-.
               ovl<- overlaps(df$Interval)
               return(data.frame(ID=df$ID[1], ovl))
       })

Also, I hope that someone comes up with a more elegant solution to my overlaps function..

这篇关于R 查找时间段之间的重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆