在数据框中将重叠范围合并为唯一的组 [英] Merge overlapping ranges into unique groups, in dataframe
问题描述
我有一个n行3个数据框
I have a dataframe of n rows and 3
df <- data.frame(start=c(178,400,983,1932,33653),
end=c(5025,5025, 5535, 6918, 38197),
group=c(1,1,2,2,3))
df
start end group
1 178 5025 1
2 400 5025 1
3 983 5535 2
4 1932 6918 2
5 33653 38197 3
我想创建一个新列 df $ group2
将重叠的组重新分类为相同。例如, df $ group [df $ group == 1]
开始于178,结束于5025。这与 df $ group [df $ group == 2]
,其起始于983,结束于6918。我想创建一个新列,该列现在将组1和2归为组1(随后将组3归为组2)。 。
I would like to make a new column df$group2
that re-classifies groups that overlap to be the same. For example, df$group[df$group==1]
starts at 178 and ends at 5025. This overlaps with df$group[df$group==2]
, which starts at 983 and ends at 6918. I would like to make a new column that now classifies group 1 and 2 as group 1 (and subsequently, group 3 as group 2).
结果:
df
start end group group2
1 178 5025 1 1
2 400 5025 1 1
3 983 5535 2 1
4 1932 6918 2 1
5 33653 38197 3 2
谢谢您的帮助。
推荐答案
您将需要 IRanges
软件包:
require(IRanges)
ir <- IRanges(df$start, df$end)
df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
> df
# start end group group2
# 1 178 5025 1 1
# 2 400 5025 1 1
# 3 983 5535 2 1
# 4 1932 6918 2 1
# 5 33653 38197 3 2
要安装 IRanges
,在R中键入以下行:
To install IRanges
, type these lines in R:
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
要了解更多信息(手动等。)转到 此处
To learn more (manual etc..) go here
这篇关于在数据框中将重叠范围合并为唯一的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!