解卷积间隔进入位置信息 [英] Deconvoluting intervals into position information

查看:180
本文介绍了解卷积间隔进入位置信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 fileA ,其中以间隔显示信息 - 如果连续的位置被分配相同的值,则这些连续的值被重新分组成一个间隔。

I have fileA in which the information is displayed by intervals - if consecutive positions are assigned the same value, these consecutive values are regrouped into one interval.

start     end      value    label
123       78000    0        romeo    #value 0 at positions 123 to 77999 included.
78000     78004    56       romeo    #value 56 at positions 78000, 78001, 78002 and 78003.
78004     78005    12       romeo    #value 12 at position 78004.
78006     78008    21       juliet   #value 21 at positions 78006 and 78007.
78008     78056    8        juliet   #value 8 at positions 78008 to 78055 included.

我感兴趣的间隔显示在 fileB

The intervals I am interested in are displayed in fileB:

start     end      label
77998     78005    romeo
78007     78012    juliet


fileA 中的标签最初被拉在 fileB 中,所以可以安全地假设标签总是与重叠的间隔相等。

The labels in fileA were originally pulled in from fileB, so it is safe to assume that the labels are always equivalent for overlapping intervals.

我是尝试提取与第二个文件中的间隔相对应的所有单个位置的信息,这个过程由于缺少一个更好的单词而被称为去卷积。输出 fileC 应该如下所示:

I am trying to extract the information for all the individual positions corresponding to the intervals in the second file, a process that I will call "deconvolution" for lack of a better word. The output fileC should come up like this:

position  value   label
77998     0       romeo
77999     0       romeo
78000     56      romeo
78001     56      romeo
78002     56      romeo
78003     56      romeo
78004     12      romeo   
78007     21      juliet
78008     8       juliet
78009     8       juliet
78010     8       juliet
78011     8       juliet

这是我的代码:

#read from tab-delimited text files which do not contain column names
A<-read.table("fileA.txt",sep="\t",colClasses=c("numeric","numeric","numeric","character"))
B<-read.table("fileB.txt",sep="\t",colClasses=c("numeric","numeric","character"))
#create empty table.frame for the output
C <- data.frame (1,2,3)
C <- C[-1,]

#add column names
colnames(A)<-c("start","end","value","label")
colnames(B)<-c("start","end","label")
colnames(C)<-c("position","value","label")

#extract position information
deconvolute <- function(x,y,z) {
    for x$label %in% y$label {
        #compute sequence of overlapping positions
        overlap<-seq(max(x$start,y$start),x$end,1)
        z$position<-overlap
        #assign corresponding values to the other columns
        z$value<-rep(x$value,length(overlap))
        z$label<-rep(x$label,length(overlap))
        }
    }

deconvolute(A,B,C)

我在我的功能。

推荐答案

# create sequence of positions
s <- unlist(apply(B, MARGIN=1, FUN=function(x) seq(x[1], as.numeric(x[2])-1)))
s
 [1] 77998 77999 78000 78001 78002 78003 78004 78007 78008 78009 78010 78011

# matching between files A and B

pos <- unlist(sapply(s, FUN=function(x)
  which(
         apply(A, MARGIN=1, FUN=function(y) as.numeric(y[1])<=as.numeric(x) & as.numeric(x) < as.numeric(y[2])))
       ))

# new dataframe
deconvoluted <- data.frame(s, A$value[pos], A$label[pos])
deconvoluted 

      s A.value.pos. A.label.pos.
1  77998            0        romeo
2  77999            0        romeo
3  78000           56        romeo
4  78001           56        romeo
5  78002           56        romeo
6  78003           56        romeo
7  78004           12        romeo
8  78007           21       juliet
9  78008            8       juliet
10 78009            8       juliet
11 78010            8       juliet
12 78011            8       juliet

这篇关于解卷积间隔进入位置信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆