在数据框中将重叠范围合并为唯一组 [英] Merge overlapping ranges into unique groups, in dataframe

查看:22
本文介绍了在数据框中将重叠范围合并为唯一组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 n 行和 3 行的数据框

I have a dataframe of n rows and 3

df <- data.frame(start=c(178,400,983,1932,33653),
    end=c(5025,5025, 5535, 6918, 38197),
    group=c(1,1,2,2,3))

df
  start   end group
1   178  5025     1
2   400  5025     1
3   983  5535     2
4  1932  6918     2
5 33653 38197     3

我想创建一个新列 df$group2 将重叠的组重新分类为相同.例如,df$group[df$group==1] 从 178 开始,到 5025 结束.这与 df$group[df$group==2] 重叠,从 983 开始,到 6918 结束.我想创建一个新列,现在将第 1 组和第 2 组分类为第 1 组(随后将第 3 组分类为第 2 组).

I would like to make a new column df$group2 that re-classifies groups that overlap to be the same. For example, df$group[df$group==1] starts at 178 and ends at 5025. This overlaps with df$group[df$group==2], which starts at 983 and ends at 6918. I would like to make a new column that now classifies group 1 and 2 as group 1 (and subsequently, group 3 as group 2).

结果:

df
  start   end group group2
1   178  5025     1      1
2   400  5025     1      1
3   983  5535     2      1
4  1932  6918     2      1
5 33653 38197     3      2

提前感谢您的帮助.

推荐答案

你需要 IRanges 包:

require(IRanges)
ir <- IRanges(df$start, df$end)
df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
> df

#  start   end group group2
# 1   178  5025     1      1
# 2   400  5025     1      1
# 3   983  5535     2      1
# 4  1932  6918     2      1
# 5 33653 38197     3      2

要安装 IRanges,请在 R 中输入这些行:

To install IRanges, type these lines in R:

source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")

要了解更多信息(手册等),请转到 这里

To learn more (manual etc..) go here

这篇关于在数据框中将重叠范围合并为唯一组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆