在标识符的值内,将重叠间隔分成非重叠间隔 [英] Split overlapping intervals into non-overlapping intervals, within values of an identifier
问题描述
我想在标识符类别中采用一组可能重叠的间隔,并创建完全重叠(即相同的开始/结束值)或完全不重叠的新间隔。这些新的时间间隔应共同跨越原始时间间隔的范围,并且不包括原始时间间隔以外的任何范围。
I would like to take a set of intervals, possibly overlapping, within categories of an identifier and create new intervals that are either exactly overlapping (ie same start/end values) or completely non-overlapping. These new intervals should collectively span the range of the original intervals and not include any ranges not in the original intervals.
这需要相对较快的操作,因为我处理大量数据。
This needs to be a relatively fast operation because I'm working with lots of data.
以下是一些示例数据:
library(data.table)
set.seed(1113)
start1 <- c(1,7,9, 17, 18,1,3,20)
end1 <- c(10,12,15, 20, 23,3,5,25)
id1 <- c(1,1,1,1,1,2,2,2)
obs <- rnorm(length(id1))
x <- data.table(start1,end1,id1,obs)
> x
start1 end1 id1 obs
1: 1 10 1 -0.79701638
2: 7 12 1 -0.09251333
3: 9 15 1 -0.08118742
4: 17 20 1 -2.33312797
5: 18 23 1 0.26581138
6: 1 3 2 -0.34314127
7: 3 5 2 -0.17196880
8: 20 25 2 0.11614842
输出应该是这样的:
id1 start1 end1 i.start1 i.end1 obs
1: 1 1 6 1 10 -0.79701638
2: 1 7 8 1 10 -0.79701638
3: 1 7 8 7 12 -0.09251333
4: 1 9 10 1 10 -0.79701638
5: 1 9 10 7 12 -0.09251333
6: 1 9 10 9 15 -0.08118742
7: 1 11 12 7 12 -0.09251333
8: 1 11 12 9 15 -0.08118742
9: 1 13 15 9 15 -0.08118742
10: 1 17 17 17 20 -2.33312797
11: 1 18 20 17 20 -2.33312797
12: 1 18 20 18 23 0.26581138
13: 1 21 23 18 23 0.26581138
14: 2 1 2 1 3 -0.34314127
15: 2 3 3 1 3 -0.34314127
16: 2 3 3 3 5 -0.17196880
17: 2 4 5 3 5 -0.17196880
18: 2 20 25 20 25 0.11614842
我发现了与我想要的算法相对应的算法:
https://softwareengineering.stackexchange.com/questions/363091/split-将范围重叠到所有唯一范围中?newreg = 93383e379afe4dd3a595480528ee1541
I found this algorithm that corresponds to what I want: https://softwareengineering.stackexchange.com/questions/363091/split-overlapping-ranges-into-all-unique-ranges?newreg=93383e379afe4dd3a595480528ee1541
我尝试直接对其进行编程,但是速度很慢。
I tried programming it directly but it was quite slow.
推荐答案
我写了一个包, intervalaverage
,用于此功能和一些相关功能:
I wrote a package, intervalaverage
, for this and some related functions:
library(data.table)
set.seed(1113)
start1 <- c(1,7,9, 17, 18,1,3,20)
end1 <- c(10,12,15, 20, 23,3,5,25)
id1 <- c(1,1,1,1,1,2,2,2)
obs <- rnorm(length(id1))
x <- data.table(start1,end1,id1,obs)
library(intervalaverage)
x[, start1:=as.integer(start1)]
x[, end1:=as.integer(end1)]
isolateoverlaps(x,interval_vars = c("start1","end1"),group_vars = "id1")
id1 start end start1 end1 obs
1: 1 1 6 1 10 -0.79701638
2: 1 7 8 1 10 -0.79701638
3: 1 9 10 1 10 -0.79701638
4: 1 7 8 7 12 -0.09251333
5: 1 9 10 7 12 -0.09251333
6: 1 11 12 7 12 -0.09251333
7: 1 9 10 9 15 -0.08118742
8: 1 11 12 9 15 -0.08118742
9: 1 13 15 9 15 -0.08118742
10: 1 17 17 17 20 -2.33312797
11: 1 18 20 17 20 -2.33312797
12: 1 18 20 18 23 0.26581138
13: 1 21 23 18 23 0.26581138
14: 2 1 2 1 3 -0.34314127
15: 2 3 3 1 3 -0.34314127
16: 2 3 3 3 5 -0.17196880
17: 2 4 5 3 5 -0.17196880
18: 2 20 25 20 25 0.11614842
y <- data.table(start1=c(1L,5L,5L),end1=c(5L,5L,10L),id=c(1L,1L,1L))
isolateoverlaps(y,interval_vars = c("start1","end1"),group_vars = "id")
id start end start1 end1
1: 1 1 4 1 5
2: 1 5 5 1 5
3: 1 5 5 5 5
4: 1 5 5 5 10
5: 1 6 10 5 10
这篇关于在标识符的值内,将重叠间隔分成非重叠间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!