以重叠的时间间隔和ID为条件的匹配值 [英] Matching values conditioned on overlapping Intervals and ID
问题描述
我在处理以下数据结构时遇到困难:
I am having difficulties at manipulating the following data structures:
属性数据框:
ID Begin_A End_A Interval Value
5 2017-03-01 2017-03-10 2017-03-01 UTC--2017-03-10 UTC Cat1
10 2017-12-01 2017-12-02 2017-12-01 UTC--2017-12-02 UTC Cat2
5 2017-03-01 2017-03-03 2017-03-01 UTC--2017-03-03 UTC Cat3
10 2017-12-05 2017-12-10 2017-12-05 UTC--2017-12-10 UTC Cat4
预订数据框架:
ID Begin_A End_A Interval
5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC
6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC
8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC
10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC
所需结果框架(预订):
Desired outcome frame (Bookings):
ID Begin_A End_A Interval Attribute_value
5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC Cat1,Cat3
6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC NA
8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC NA
10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC Cat4
数据帧的代码:
library(lubridate)
# Attributes data frame:
date1 <- as.Date(c('2017-3-1','2017-12-1','2017-3-1','2017-12-5'))
date2 <- as.Date(c('2017-3-10','2017-12-2','2017-3-3','2017-12-10'))
attributes <- data.frame(matrix(NA,nrow=4, ncol = 5))
names(attributes) <- c("ID","Begin_A", "End_A", "Interval", "Value")
attributes$ID <- as.numeric(c(5,10,5,10))
attributes$Begin_A <-date1
attributes$End_A <-date2
attributes$Interval <-attributes$Begin_A %--% attributes$End_A
attributes$Value<- as.character(c("Cat1","Cat2","Cat3","Cat4"))
### Bookings data frame:
date1 <- as.Date(c('2017-3-3','2017-5-3','2017-3-3','2017-12-5'))
date2 <- as.Date(c('2017-3-5','2017-5-5','2017-3-5','2017-12-6'))
bookings <- data.frame(matrix(NA,nrow=4, ncol = 4))
names(bookings) <- c("ID","Begin_A", "End_A", "Interval")
bookings$ID <- as.numeric(c(5,6,8,10))
bookings$Begin_A <-date1
bookings$End_A <-date2
bookings$Interval <-bookings$Begin_A %--% bookings$End_A
达到我的结果框架的过程应如下: 从预订中获取ID,过滤属性数据框中属性ID与预订ID匹配的所有行.检查哪些具有匹配属性ID的行也具有重叠的时间间隔(来自lubridate的int_overlaps).然后从值"列中获取相应的值,并在Attribute_value列中打印每个值.
The procedure to arrive at my outcome frame should be the following: Take the ID from bookings, filter all rows of the attributes data frame where attributes ID matches the booking ID. Check which of the rows with matching attribute ID also have overlapping time intervals (int_overlaps from lubridate). Then take the respective value from the Value column and print each of them in the Attribute_value column.
推荐答案
来自tidyverse
的解决方案.
library(tidyverse)
attributes2 <- attributes %>%
select(-Interval) %>%
gather(Type, Date, ends_with("_A")) %>%
select(-Type) %>%
group_by(Value) %>%
complete(Date = full_seq(Date, period = 1), ID) %>%
ungroup()
bookings2 <- bookings %>%
select(-Interval) %>%
gather(Type, Date, ends_with("_A")) %>%
select(-Type) %>%
group_by(ID) %>%
complete(Date = full_seq(Date, period = 1)) %>%
ungroup()
bookings3 <- bookings2 %>%
left_join(attributes2, by = c("ID", "Date")) %>%
group_by(ID) %>%
summarise(Attribute_value = toString(sort(unique(Value)))) %>%
mutate(Attribute_value = ifelse(Attribute_value %in% "", NA, Attribute_value))
bookings4 <- bookings %>% left_join(bookings3, by = "ID")
bookings4
ID Begin_A End_A Interval Attribute_value
1 5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC Cat1, Cat3
2 6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC <NA>
3 8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC <NA>
4 10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC Cat4
这篇关于以重叠的时间间隔和ID为条件的匹配值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!