以重叠的时间间隔和ID为条件的匹配值 [英] Matching values conditioned on overlapping Intervals and ID

查看:97
本文介绍了以重叠的时间间隔和ID为条件的匹配值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理以下数据结构时遇到困难:

I am having difficulties at manipulating the following data structures:

属性数据框:

ID  Begin_A      End_A        Interval                         Value
5   2017-03-01   2017-03-10   2017-03-01 UTC--2017-03-10 UTC   Cat1
10  2017-12-01   2017-12-02   2017-12-01 UTC--2017-12-02 UTC   Cat2
5   2017-03-01   2017-03-03   2017-03-01 UTC--2017-03-03 UTC   Cat3
10  2017-12-05   2017-12-10   2017-12-05 UTC--2017-12-10 UTC   Cat4

预订数据框架:

ID  Begin_A      End_A        Interval 
5   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC
6   2017-05-03   2017-05-05   2017-05-03 UTC--2017-05-05 UTC
8   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC
10  2017-12-05   2017-12-06   2017-12-05 UTC--2017-12-06 UTC

所需结果框架(预订):

Desired outcome frame (Bookings):

ID  Begin_A      End_A        Interval                        Attribute_value
5   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC  Cat1,Cat3
6   2017-05-03   2017-05-05   2017-05-03 UTC--2017-05-05 UTC  NA
8   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC  NA
10  2017-12-05   2017-12-06   2017-12-05 UTC--2017-12-06 UTC  Cat4

数据帧的代码:

library(lubridate)
# Attributes data frame:
date1 <- as.Date(c('2017-3-1','2017-12-1','2017-3-1','2017-12-5'))
date2 <- as.Date(c('2017-3-10','2017-12-2','2017-3-3','2017-12-10'))
attributes <- data.frame(matrix(NA,nrow=4, ncol = 5)) 
names(attributes) <- c("ID","Begin_A", "End_A", "Interval", "Value")
attributes$ID <- as.numeric(c(5,10,5,10))
attributes$Begin_A <-date1
attributes$End_A <-date2
attributes$Interval <-attributes$Begin_A %--% attributes$End_A
attributes$Value<- as.character(c("Cat1","Cat2","Cat3","Cat4"))

### Bookings data frame:

date1 <- as.Date(c('2017-3-3','2017-5-3','2017-3-3','2017-12-5'))
date2 <- as.Date(c('2017-3-5','2017-5-5','2017-3-5','2017-12-6'))
bookings <- data.frame(matrix(NA,nrow=4, ncol = 4)) 
names(bookings) <- c("ID","Begin_A", "End_A", "Interval")
bookings$ID <- as.numeric(c(5,6,8,10))
bookings$Begin_A <-date1
bookings$End_A <-date2
bookings$Interval <-bookings$Begin_A %--% bookings$End_A

达到我的结果框架的过程应如下: 从预订中获取ID,过滤属性数据框中属性ID与预订ID匹配的所有行.检查哪些具有匹配属性ID的行也具有重叠的时间间隔(来自lubridate的int_overlaps).然后从值"列中获取相应的值,并在Attribute_value列中打印每个值.

The procedure to arrive at my outcome frame should be the following: Take the ID from bookings, filter all rows of the attributes data frame where attributes ID matches the booking ID. Check which of the rows with matching attribute ID also have overlapping time intervals (int_overlaps from lubridate). Then take the respective value from the Value column and print each of them in the Attribute_value column.

推荐答案

来自tidyverse的解决方案.

library(tidyverse)

attributes2 <- attributes %>%
  select(-Interval) %>%
  gather(Type, Date, ends_with("_A")) %>%
  select(-Type) %>%
  group_by(Value) %>%
  complete(Date = full_seq(Date, period = 1), ID) %>%
  ungroup()

bookings2 <- bookings %>%
  select(-Interval) %>%
  gather(Type, Date, ends_with("_A")) %>%
  select(-Type) %>%
  group_by(ID) %>%
  complete(Date = full_seq(Date, period = 1)) %>%
  ungroup()

bookings3 <- bookings2 %>%
  left_join(attributes2, by = c("ID", "Date")) %>%
  group_by(ID) %>%
  summarise(Attribute_value = toString(sort(unique(Value)))) %>%
  mutate(Attribute_value = ifelse(Attribute_value %in% "", NA, Attribute_value))

bookings4 <- bookings %>% left_join(bookings3, by = "ID")
bookings4
  ID    Begin_A      End_A                       Interval Attribute_value
1  5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC      Cat1, Cat3
2  6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC            <NA>
3  8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC            <NA>
4 10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC            Cat4

这篇关于以重叠的时间间隔和ID为条件的匹配值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆