R中的常见基因组区间 [英] Common genomic intervals in R

查看：83 发布时间：2021/6/13 19:34:45 r overlap overlapping genome

本文介绍了R中的常见基因组区间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想推断不同样本之间的共享基因组间隔.

I would like to infer shared genomic interval between different samples.

我的输入:

sample    chr start end
NE001      1   100  200
NE001      2   100  200
NE002      1   50   150
NE002      2   50   150
NE003      2   250  300

我的预期输出:

chr start end  freq
1    100  150   2
2    100  150   2

其中频率"是有多少样本有助于推断共享区域.在上面的例子中，freq = 2(NE001 和 NE002).

Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002).

干杯！

推荐答案

如果您的数据在 data.frame(见下文)中，使用 Bioconductor GenomicRanges 包我创建了一个 GRanges 实例，也保留非范围列

If your data is in a data.frame (see below), using the Bioconductor GenomicRanges package I create a GRanges instance, keeping the non-range columns too

library(GenomicRanges)
gr <- makeGRangesFromDataFrame(df, TRUE)

数据所代表的离散范围由disjoin函数给出，不相交范围('query')和你原来的('subject')之间的重叠是

The discrete ranges represented by the data are given by the disjoin function, and the overlap between the disjoint ranges ('query') and your original ('subject') are

d <- disjoin(gr)
olaps <- findOverlaps(d, gr)

将与每个重叠主题关联的样本信息与相应的查询分开，并将其与不相交的GRanges关联为

Split the sample information associated with each overlapping subject with the corresponding query, and associate it with the disjoint GRanges as

mcols(d) <- splitAsList(gr$sample[subjectHits(olaps)], queryHits(olaps))

导致例如

> d[elementLengths(d$value) > 1]
GRanges with 2 ranges and 1 metadata column:
      seqnames     ranges strand |           value
         <Rle>  <IRanges>  <Rle> | <CharacterList>
  [1]        1 [100, 150]      * |     NE001,NE002
  [2]        2 [100, 150]      * |     NE001,NE002
  ---
  seqlengths:
    1  2
   NA NA

以下是我输入您的数据的方式:

Here's how I input your data:

txt <- "sample    chr start end
NE001      1   100  200
NE001      2   100  200
NE002      1   50   150
NE002      2   50   150
NE003      2   250  300"
df <- read.table(textConnection(txt), header=TRUE, stringsAsFactors=FALSE)

这篇关于R中的常见基因组区间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R中的常见基因组区间 [英] Common genomic intervals in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中的常见基因组区间 [英] Common genomic intervals in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭