按组查找时间间隔内的日期 [英] Find dates within a period interval by group

查看:23
本文介绍了按组查找时间间隔内的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个 ID、beginend 日期的面板.beginend 日期创建一个 interval 时间.

I have a panel with many IDs, begin and end dates. begin to end date create an interval of time.

    id      begin        end                       interval overlap
 1:  1 2010-01-31 2011-06-30 2009-08-04 UTC--2011-12-27 UTC    TRUE
 2:  1 2011-01-31 2012-06-30 2010-08-04 UTC--2012-12-27 UTC    TRUE
 3:  1 2012-01-31 2013-06-30 2011-08-04 UTC--2013-12-27 UTC    TRUE
 4:  1 2013-01-31 2014-06-30 2012-08-04 UTC--2014-12-27 UTC    TRUE
 5:  1 2013-02-28 2013-07-31 2012-09-01 UTC--2014-01-27 UTC    TRUE
 6:  1 2015-02-28 2015-03-31 2014-09-01 UTC--2015-09-27 UTC    TRUE
 7:  1 2015-06-30 2015-07-31 2015-01-01 UTC--2016-01-27 UTC    TRUE
 8:  1 2015-09-30 2016-01-31 2015-04-03 UTC--2016-07-29 UTC    TRUE
 9:  2 2010-01-31 2011-06-30 2009-08-04 UTC--2011-12-27 UTC    TRUE
10:  2 2011-01-31 2012-06-30 2010-08-04 UTC--2012-12-27 UTC    TRUE
11:  2 2012-01-31 2013-06-30 2011-08-04 UTC--2013-12-27 UTC    TRUE
12:  2 2013-01-31 2014-06-30 2012-08-04 UTC--2014-12-27 UTC    TRUE
13:  2 2013-02-28 2013-07-31 2012-09-01 UTC--2014-01-27 UTC    TRUE
14:  2 2015-02-28 2015-03-31 2014-09-01 UTC--2015-09-27 UTC    TRUE
15:  2 2015-06-30 2015-07-31 2015-01-01 UTC--2016-01-27 UTC    TRUE
16:  2 2015-09-30 2016-01-31 2015-04-03 UTC--2016-07-29 UTC    TRUE

我需要测试对于每个 ID,任何 begin/end 日期是否包含在另一个 interval(相同的ID).

I need to test whether, for each ID, any of the begin/end dates is included in another interval (of the same ID).

例如,id1 begin (2010-01-31) 不包含在 id1 的任何其他时间段中,而不包含在第一行中.但是,id1 end 日期(2011-06-30)包含在第二行的间隔(2010-08-04 UTC--2012-12-27 UTC)中).

For instance, id1 begin (2010-01-31) is not included in any other period of id1 than in the first line. However, id1 end date (2011-06-30) is included in the interval of the second row (2010-08-04 UTC--2012-12-27 UTC).

我在数据表中尝试了 lubridate 间隔和 %within% 但它产生 TRUE,因为它包含在其相应的时期内.我需要知道它是否包含在同一 ID 的任何其他时间段中.

I have tried lubridate interval and %within% in data table but it yields TRUE as it is included in its corresponding period. I need to know if it is included in any other period of the same ID.

customer[begin %within% interval |结束 %within% 间隔,重叠 := TRUE, by = id]

我已经检查过 data.table 的 foverlap 但似乎设计用于连接不同的表和其他问题只是向量但是 不是带有间隔的面板.

I have checked foverlap of data.table but seems design for joining different tables and other problems are just vectors but not panels with intervals.

有什么想法吗?

数据:

structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 
2, 2, 2), begin = structure(c(14640, 15005, 15370, 15736, 15764, 
16494, 16616, 16708, 14640, 15005, 15370, 15736, 15764, 16494, 
16616, 16708), class = "Date"), end = structure(c(15155, 15521, 
15886, 16251, 15917, 16525, 16647, 16831, 15155, 15521, 15886, 
16251, 15917, 16525, 16647, 16831), class = "Date"), interval = structure(c(75600000, 
75686400, 75686400, 75600000, 44323200, 33782400, 33782400, 41731200, 
75600000, 75686400, 75686400, 75600000, 44323200, 33782400, 33782400, 
41731200), start = structure(c(1249344000, 1280880000, 1312416000, 
1344038400, 1346457600, 1409529600, 1420070400, 1428019200, 1249344000, 
1280880000, 1312416000, 1344038400, 1346457600, 1409529600, 1420070400, 
1428019200), tzone = "UTC", class = c("POSIXct", "POSIXt")), tzone = "UTC", class = structure("Interval", package = "lubridate")), 
    overlap = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
    TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), .Names = c("id", 
"begin", "end", "interval", "overlap"), row.names = c(NA, -16L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000140788>)

推荐答案

这是一种方法,使用 lubridate 中的 int_overlaps.我已经定义了开始日期和结束日期的间隔,尽管在您的数据中它们是不同的 - 也许您可以澄清哪个是正确的.

Here is one way to do it, using int_overlaps from lubridate. I have defined the intervals from the begin and end dates, although in your data they are different - perhaps you could clarify which is correct.

library(lubridate)

df$interval <- interval(as.POSIXct(df$begin),as.POSIXct(df$end))

df <- df[order(df$id),] #needs to be sorted by id for next stage to work

df$overlap <- unlist(tapply(df$interval, #loop through intervals
                            df$id, #grouped by id
                            function(x) rowSums(outer(x,x,int_overlaps))>1))
                                   #check if more than one overlap in subset for that id


df
   id      begin        end                       interval overlap
1   1 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC    TRUE
2   1 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC    TRUE
3   1 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC    TRUE
4   1 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC    TRUE
5   1 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC    TRUE
6   1 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC   FALSE
7   1 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC   FALSE
8   1 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC   FALSE
9   2 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC    TRUE
10  2 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC    TRUE
11  2 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC    TRUE
12  2 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC    TRUE
13  2 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC    TRUE
14  2 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC   FALSE
15  2 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC   FALSE
16  2 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC   FALSE

这篇关于按组查找时间间隔内的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆