背景扣除在表中 [英] background subtraction in a table

查看:137
本文介绍了背景扣除在表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有基因表达数据作为每个探针的计数,像这样:

  library(data.table) 
mydata< - fread(
molclass,mol.id,sample1,sample2,sample3
negative,negat1,0,1,2
negative,negat2,2,1 ,1
negative,negat3,1,2,0
endogen,gene1,30,15,10
endogen,gene2,60,30,20

我的问题是 - 什么是执行背景减法的最好方法,即每个 sampleN 列我需要计算背景(我们说它将是来自类的所有值的平均值),然后从每个此列的值。目前我使用以下解决方案:

  for(nm in names(mydata)[ -  c(1:2) ]){
bg< - mydata [molclass =='negative',nm,with = F];
bg< - mean(unlist(bg));
mydata [[nm]] < - (mydata [[nm]] -bg);
}

但我觉得必须有一些更好的方式。

PS我知道有一些包做这些事情,但我的数据对应的计数,而不是信号的强度 - 所以我不能使用 limma 或类似的工具设计用于微阵列。也许有些seq数据包可以帮助,但我不知道,因为我的数据不是从排序。

解决方案

您不应在 data.table 中使用< - 。你的循环中的最后一个赋值可以用 set 更好。请参阅帮助页面,键入?set 了解详情。

  mycols< ;  -  paste0('sample',1:3)
newcols< - paste0(mycols,'bk')

s< - mydata [['molclass']] negative'
mybkds< - sapply(mycols,function(j)mean(mydata [[j]] [s]))

mydata [,(newcols):= NA]
for(j in mycols)set(mydata,j = paste0(j,'bk'),value = mydata [[j]] - mybkds [j])
/ pre>

我只做了一个循环的最后一步,但这基本上与你的代码(其中一切都在循环中)相同。 * apply 函数和循环只是不同的语法,我听说过,你可以随意选择。


I have gene expression data as number of counts for each probe, something like this:

library(data.table)
mydata <- fread(
"molclass,mol.id,sample1,sample2,sample3
negative, negat1,  0, 1,   2
negative, negat2,  2, 1,   1
negative, negat3,  1, 2,   0
 endogen,  gene1, 30, 15, 10
 endogen,  gene2, 60, 30, 20
")

My question here is - what would be the best way to perform background subtraction, i.e. for each sampleN column I need to calculate background (let's say it will be the average of all values from negative class) and then subtract this background from each value of this column. For the moment I am using the following solution:

for (nm in names(mydata)[-c(1:2)]) {
  bg <- mydata[molclass=='negative', nm, with=F];
  bg <- mean(unlist(bg));
  mydata[[nm]] <- (mydata[[nm]] - bg);
}

but I feel there must be some "nicer" way.

P.S. I know that there are some packages that do those things, but my data correspond to the number of counts, not intensity of signal - so I can't use limma or similar tools designed for microarrays. Maybe some seq-data packages could help, but I am not sure because my data is not from sequencing either.

解决方案

Generally, you shouldn't use <- with a data.table. The last assignment in your loop would be better with set. See the help page by typing ?set for details.

mycols  <- paste0('sample',1:3)
newcols <- paste0(mycols,'bk')

s       <- mydata[['molclass']] == 'negative'
mybkds  <- sapply(mycols,function(j) mean(mydata[[j]][s]) )

mydata[,(newcols):=NA]
for (j in mycols) set(mydata,j=paste0(j,'bk'),value=mydata[[j]]-mybkds[j])

I've only done the last step in a loop, but this is basically the same as your code (where everything is in the loop). *apply functions and loops are just different syntax, I've heard, and you can go with whichever you prefer.

这篇关于背景扣除在表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆