在标识符的值内,将重叠间隔分成非重叠间隔 [英] Split overlapping intervals into non-overlapping intervals, within values of an identifier

查看:92
本文介绍了在标识符的值内,将重叠间隔分成非重叠间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在标识符类别中采用一组可能重叠的间隔,并创建完全重叠(即相同的开始/结束值)或完全不重叠的新间隔。这些新的时间间隔应共同跨越原始时间间隔的范围,并且不包括原始时间间隔以外的任何范围。

I would like to take a set of intervals, possibly overlapping, within categories of an identifier and create new intervals that are either exactly overlapping (ie same start/end values) or completely non-overlapping. These new intervals should collectively span the range of the original intervals and not include any ranges not in the original intervals.

这需要相对较快的操作,因为我处理大量数据。

This needs to be a relatively fast operation because I'm working with lots of data.

以下是一些示例数据:

library(data.table)
set.seed(1113)
start1 <- c(1,7,9, 17, 18,1,3,20)
end1 <- c(10,12,15, 20, 23,3,5,25)
id1 <- c(1,1,1,1,1,2,2,2)
obs <- rnorm(length(id1))
x <- data.table(start1,end1,id1,obs)

    > x
   start1 end1 id1         obs
1:      1   10   1 -0.79701638
2:      7   12   1 -0.09251333
3:      9   15   1 -0.08118742
4:     17   20   1 -2.33312797
5:     18   23   1  0.26581138
6:      1    3   2 -0.34314127
7:      3    5   2 -0.17196880
8:     20   25   2  0.11614842

输出应该是这样的:

    id1 start1 end1 i.start1 i.end1         obs
 1:   1      1    6        1     10 -0.79701638
 2:   1      7    8        1     10 -0.79701638
 3:   1      7    8        7     12 -0.09251333
 4:   1      9   10        1     10 -0.79701638
 5:   1      9   10        7     12 -0.09251333
 6:   1      9   10        9     15 -0.08118742
 7:   1     11   12        7     12 -0.09251333
 8:   1     11   12        9     15 -0.08118742
 9:   1     13   15        9     15 -0.08118742
10:   1     17   17       17     20 -2.33312797
11:   1     18   20       17     20 -2.33312797
12:   1     18   20       18     23  0.26581138
13:   1     21   23       18     23  0.26581138
14:   2      1    2        1      3 -0.34314127
15:   2      3    3        1      3 -0.34314127
16:   2      3    3        3      5 -0.17196880
17:   2      4    5        3      5 -0.17196880
18:   2     20   25       20     25  0.11614842

我发现了与我想要的算法相对应的算法:
https://softwareengineering.stackexchange.com/questions/363091/split-将范围重叠到所有唯一范围中?newreg = 93383e379afe4dd3a595480528ee1541

I found this algorithm that corresponds to what I want: https://softwareengineering.stackexchange.com/questions/363091/split-overlapping-ranges-into-all-unique-ranges?newreg=93383e379afe4dd3a595480528ee1541

我尝试直接对其进行编程,但是速度很慢。

I tried programming it directly but it was quite slow.

推荐答案

我写了一个包, intervalaverage ,用于此功能和一些相关功能:

I wrote a package, intervalaverage, for this and some related functions:

library(data.table)
set.seed(1113)
start1 <- c(1,7,9, 17, 18,1,3,20)
end1 <- c(10,12,15, 20, 23,3,5,25)
id1 <- c(1,1,1,1,1,2,2,2)
obs <- rnorm(length(id1))
x <- data.table(start1,end1,id1,obs)

library(intervalaverage)

x[, start1:=as.integer(start1)]
x[, end1:=as.integer(end1)]
isolateoverlaps(x,interval_vars = c("start1","end1"),group_vars = "id1")


    id1 start end start1 end1         obs
 1:   1     1   6      1   10 -0.79701638
 2:   1     7   8      1   10 -0.79701638
 3:   1     9  10      1   10 -0.79701638
 4:   1     7   8      7   12 -0.09251333
 5:   1     9  10      7   12 -0.09251333
 6:   1    11  12      7   12 -0.09251333
 7:   1     9  10      9   15 -0.08118742
 8:   1    11  12      9   15 -0.08118742
 9:   1    13  15      9   15 -0.08118742
10:   1    17  17     17   20 -2.33312797
11:   1    18  20     17   20 -2.33312797
12:   1    18  20     18   23  0.26581138
13:   1    21  23     18   23  0.26581138
14:   2     1   2      1    3 -0.34314127
15:   2     3   3      1    3 -0.34314127
16:   2     3   3      3    5 -0.17196880
17:   2     4   5      3    5 -0.17196880
18:   2    20  25     20   25  0.11614842


y <- data.table(start1=c(1L,5L,5L),end1=c(5L,5L,10L),id=c(1L,1L,1L)) 
isolateoverlaps(y,interval_vars = c("start1","end1"),group_vars = "id")


   id start end start1 end1
1:  1     1   4      1    5
2:  1     5   5      1    5
3:  1     5   5      5    5
4:  1     5   5      5   10
5:  1     6  10      5   10

这篇关于在标识符的值内,将重叠间隔分成非重叠间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆