时间间隔重叠按组匹配 [英] Time-interval overlap match by group
本文介绍了时间间隔重叠按组匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我有以下DF:
id flag time
1 1 2017-01-01 UTC--2017-01-07 UTC
1 0 2018-01-01 UTC--2019-01-01 UTC
1 0 2017-01-03 UTC--2017-01-09 UTC
2 1 2017-01-01 UTC--2017-01-15 UTC
2 1 2018-07-01 UTC--2018-09-01 UTC
2 1 2018-10-12 UTC--2018-10-20 UTC
2 0 2017-01-12 UTC--2017-01-16 UTC
2 0 2017-03-01 UTC--2017-03-15 UTC
2 0 2017-12-01 UTC--2017-12-31 UTC
2 0 2018-08-15 UTC--2018-09-19 UTC
2 0 2018-10-01 UTC--2018-10-21 UTC
使用以下代码创建:
df <- data.frame(id=c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
flag=c(1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0),
time=c(interval(ymd(20170101), ymd(20170107)),
interval(ymd(20180101), ymd(20190101)),
interval(ymd(20170103), ymd(20170109)),
# Casos
interval(ymd(20170101), ymd(20170115)),
interval(ymd(20180701), ymd(20180901)),
interval(ymd(20181012), ymd(20181020)),
# Controles
interval(ymd(20170112), ymd(20170116)),
interval(ymd(20170301), ymd(20170315)),
interval(ymd(20171201), ymd(20171231)),
interval(ymd(20180815), ymd(20180919)),
interval(ymd(20181001), ymd(20181021))))
我想获得这个结果
id flag time value
1 1 2017-01-01 UTC--2017-01-07 UTC NA
1 0 2018-01-01 UTC--2019-01-01 UTC 0
1 0 2017-01-03 UTC--2017-01-09 UTC 1
2 1 2017-01-01 UTC--2017-01-15 UTC NA
2 1 2018-07-01 UTC--2018-09-01 UTC NA
2 1 2018-10-12 UTC--2018-10-20 UTC NA
2 0 2017-01-12 UTC--2017-01-16 UTC 1
2 0 2017-03-01 UTC--2017-03-15 UTC 0
2 0 2017-12-01 UTC--2017-12-31 UTC 0
2 0 2018-08-15 UTC--2018-09-19 UTC 1
2 0 2018-10-01 UTC--2018-10-21 UTC 1
这是我想将标志= 0的时间间隔与每个组中所有可能的标志= 1进行比较,以查看标志0和标志1之间是否存在至少一个时间重叠
This is, I want to compare the time intervals of flag = 0 to all possible flag = 1, within each group, to see if there is at least one time overlap between flag 0 and flag 1
出于这些目的,我尝试使用lubridate int_overlaps 函数
For these purpose I have tried with lubridate int_overlaps function
我尝试了以下代码,但不起作用:
I have tried the following code but does not work:
result <- df %>%
group_by(id) %>%
mutate(value = ifelse(flag == 0 & int_overlaps(time, any(time[flag == 1])), 1, 0))
我发现了一种非常相似的方法:
I have found a very similar approach:
推荐答案
您可以使用 purrr
中的 map_int
来查看任何
间隔是否重叠在每个 id
中:
You can use map_int
from purrr
to see if any
intervals overlap within each id
:
library(tidyverse)
library(lubridate)
df %>%
group_by(id) %>%
mutate(value = ifelse(flag == 0, map_int(time, ~ any(int_overlaps(.x, time[flag == 1]))), NA))
输出
# A tibble: 11 x 4
# Groups: id [2]
id flag time value
<dbl> <dbl> <Interval> <int>
1 1 1 2017-01-01 UTC--2017-01-07 UTC NA
2 1 0 2018-01-01 UTC--2019-01-01 UTC 0
3 1 0 2017-01-03 UTC--2017-01-09 UTC 1
4 2 1 2017-01-01 UTC--2017-01-15 UTC NA
5 2 1 2018-07-01 UTC--2018-09-01 UTC NA
6 2 1 2018-10-12 UTC--2018-10-20 UTC NA
7 2 0 2017-01-12 UTC--2017-01-16 UTC 1
8 2 0 2017-03-01 UTC--2017-03-15 UTC 0
9 2 0 2017-12-01 UTC--2017-12-31 UTC 0
10 2 0 2018-08-15 UTC--2018-09-19 UTC 1
11 2 0 2018-10-01 UTC--2018-10-21 UTC 1
这篇关于时间间隔重叠按组匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文