时间间隔重叠按组匹配 [英] Time-interval overlap match by group

查看:53
本文介绍了时间间隔重叠按组匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下DF:

id  flag            time
1   1   2017-01-01 UTC--2017-01-07 UTC
1   0   2018-01-01 UTC--2019-01-01 UTC
1   0   2017-01-03 UTC--2017-01-09 UTC
2   1   2017-01-01 UTC--2017-01-15 UTC
2   1   2018-07-01 UTC--2018-09-01 UTC
2   1   2018-10-12 UTC--2018-10-20 UTC
2   0   2017-01-12 UTC--2017-01-16 UTC
2   0   2017-03-01 UTC--2017-03-15 UTC
2   0   2017-12-01 UTC--2017-12-31 UTC
2   0   2018-08-15 UTC--2018-09-19 UTC
2   0   2018-10-01 UTC--2018-10-21 UTC

使用以下代码创建:

df <- data.frame(id=c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),     
                  flag=c(1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), 
                  time=c(interval(ymd(20170101), ymd(20170107)),
                       interval(ymd(20180101), ymd(20190101)), 
                       interval(ymd(20170103), ymd(20170109)), 
                       # Casos
                       interval(ymd(20170101), ymd(20170115)), 
                       interval(ymd(20180701), ymd(20180901)),
                       interval(ymd(20181012), ymd(20181020)),
                       # Controles
                       interval(ymd(20170112), ymd(20170116)),
                       interval(ymd(20170301), ymd(20170315)),
                       interval(ymd(20171201), ymd(20171231)),
                       interval(ymd(20180815), ymd(20180919)),
                       interval(ymd(20181001), ymd(20181021))))

我想获得这个结果

id  flag            time              value
1   1   2017-01-01 UTC--2017-01-07 UTC  NA
1   0   2018-01-01 UTC--2019-01-01 UTC  0
1   0   2017-01-03 UTC--2017-01-09 UTC  1
2   1   2017-01-01 UTC--2017-01-15 UTC  NA
2   1   2018-07-01 UTC--2018-09-01 UTC  NA
2   1   2018-10-12 UTC--2018-10-20 UTC  NA
2   0   2017-01-12 UTC--2017-01-16 UTC  1
2   0   2017-03-01 UTC--2017-03-15 UTC  0
2   0   2017-12-01 UTC--2017-12-31 UTC  0
2   0   2018-08-15 UTC--2018-09-19 UTC  1
2   0   2018-10-01 UTC--2018-10-21 UTC  1

这是我想将标志= 0的时间间隔与每个组中所有可能的标志= 1进行比较,以查看标志0和标志1之间是否存在至少一个时间重叠

This is, I want to compare the time intervals of flag = 0 to all possible flag = 1, within each group, to see if there is at least one time overlap between flag 0 and flag 1

出于这些目的,我尝试使用lubridate int_overlaps 函数

For these purpose I have tried with lubridate int_overlaps function

我尝试了以下代码,但不起作用:

I have tried the following code but does not work:

result <- df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0 & int_overlaps(time, any(time[flag == 1])), 1, 0))

我发现了一种非常相似的方法:

I have found a very similar approach:

R:确定每个日期间隔是否与数据框中的所有其他日期间隔重叠

推荐答案

您可以使用 purrr 中的 map_int 来查看任何间隔是否重叠在每个 id 中:

You can use map_int from purrr to see if any intervals overlap within each id:

library(tidyverse)
library(lubridate)

df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0, map_int(time, ~ any(int_overlaps(.x, time[flag == 1]))), NA))

输出

# A tibble: 11 x 4
# Groups:   id [2]
      id  flag time                           value
   <dbl> <dbl> <Interval>                     <int>
 1     1     1 2017-01-01 UTC--2017-01-07 UTC    NA
 2     1     0 2018-01-01 UTC--2019-01-01 UTC     0
 3     1     0 2017-01-03 UTC--2017-01-09 UTC     1
 4     2     1 2017-01-01 UTC--2017-01-15 UTC    NA
 5     2     1 2018-07-01 UTC--2018-09-01 UTC    NA
 6     2     1 2018-10-12 UTC--2018-10-20 UTC    NA
 7     2     0 2017-01-12 UTC--2017-01-16 UTC     1
 8     2     0 2017-03-01 UTC--2017-03-15 UTC     0
 9     2     0 2017-12-01 UTC--2017-12-31 UTC     0
10     2     0 2018-08-15 UTC--2018-09-19 UTC     1
11     2     0 2018-10-01 UTC--2018-10-21 UTC     1

这篇关于时间间隔重叠按组匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆