折叠具有重叠范围的行 [英] Collapse rows with overlapping ranges

查看:71
本文介绍了折叠具有重叠范围的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有开始和结束时间的 data.frame:

I have a data.frame with start and end time:

ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625),stop = c(79.72187,79.72375,79.94312,87.75625,104.94062))

> ranges
     start      stop
1 65.72000  79.72187
2 65.72187  79.72375
3 65.94312  79.94312
4 73.75625  87.75625
5 89.61625 104.94062

在此示例中,第 2 行和第 3 行中的范围完全在第 1 行开始"和第 4 行停止之间的范围内.因此,重叠范围 1-4 应折叠为一个范围:

In this example, the ranges in row 2 and 3 are entirely within the range between 'start' on row 1 and stop on row 4. Thus, the overlapping ranges 1-4 should be collapsed to one range:

> ranges
     start      stop
1 65.72000  87.75625
5 89.61625 104.94062

我试过这个:

mdat <- outer(ranges$start, ranges$stop, function(x,y) y > x)
mdat[upper.tri(mdat)|col(mdat)==row(mdat)] <- NA
mdat

现在我只需要弄清楚如何组合所有真实的,但不确定这是否是最好的方法

And now I just need to figure out how to combine all the true ones, but not sure if it's the best way to go

推荐答案

你可以试试这个:

library(dplyr)
ranges %>% 
       arrange(start) %>% 
       group_by(g = cumsum(cummax(lag(stop, default = first(stop))) < start)) %>% 
       summarise(start = first(start), stop = max(stop))

# A tibble: 2 × 3
#      g    start      stop
#  <int>    <dbl>     <dbl>
#1     0 65.72000  87.75625
#2     1 89.61625 104.94062

这篇关于折叠具有重叠范围的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆