如何展平/合并重叠的时间段 [英] How to flatten / merge overlapping time periods

查看:102
本文介绍了如何展平/合并重叠的时间段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的时间段数据集,由开始"和结束"列定义.有些时期是重叠的.

I have a large data set of time periods, defined by a 'start' and and an 'end' column. Some of the periods overlap.

我想将所有重叠的时间段合并(拼合/合并/折叠)以具有一个开始"值和一个结束"值.

I would like to combine (flatten / merge / collapse) all overlapping time periods to have one 'start' value and one 'end' value.

一些示例数据:

  ID      start        end
1  A 2013-01-01 2013-01-05
2  A 2013-01-01 2013-01-05
3  A 2013-01-02 2013-01-03
4  A 2013-01-04 2013-01-06
5  A 2013-01-07 2013-01-09
6  A 2013-01-08 2013-01-11
7  A 2013-01-12 2013-01-15

所需结果:

  ID      start        end
1  A 2013-01-01 2013-01-06
2  A 2013-01-07 2013-01-11
3  A 2013-01-12 2013-01-15

我尝试过的事情:

  require(dplyr)
  data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "A"), 
    start = structure(c(1356998400, 1356998400, 1357084800, 1357257600, 
    1357516800, 1357603200, 1357948800), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), end = structure(c(1357344000, 1357344000, 1357171200, 
    1357430400, 1357689600, 1357862400, 1358208000), tzone = "UTC", class = c("POSIXct", 
    "POSIXt"))), .Names = c("ID", "start", "end"), row.names = c(NA, 
-7L), class = "data.frame")

remove.overlaps <- function(data){
data2 <- data
for ( i in 1:length(unique(data$start))) {
x3 <- filter(data2, start>=data$start[i] & start<=data$end[i])
x4 <- x3[1,]
x4$end <- max(x3$end)
data2 <- filter(data2, start<data$start[i] | start>data$end[i])
data2 <- rbind(data2,x4)  
}
data2 <- na.omit(data2)}

data <- remove.overlaps(data)

推荐答案

这是一个可能的解决方案.这里的基本思想是使用cummax函数将滞后的start日期与最大终止日期直到现在"进行比较,并创建一个将数据分成组的索引

Here's a possible solution. The basic idea here is to compare lagged start date with the maximum end date "until now" using the cummax function and create an index that will separate the data into groups

data %>%
  arrange(ID, start) %>% # as suggested by @Jonno in case the data is unsorted
  group_by(ID) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
                     cummax(as.numeric(end)))[-n()])) %>%
  group_by(ID, indx) %>%
  summarise(start = first(start), end = last(end))

# Source: local data frame [3 x 4]
# Groups: ID
# 
#   ID indx      start        end
# 1  A    0 2013-01-01 2013-01-06
# 2  A    1 2013-01-07 2013-01-11
# 3  A    2 2013-01-12 2013-01-15

这篇关于如何展平/合并重叠的时间段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆