背景扣除在表中 [英] background subtraction in a table
问题描述
我有基因表达数据作为每个探针的计数,像这样:
library(data.table)
mydata< - fread(
molclass,mol.id,sample1,sample2,sample3
negative,negat1,0,1,2
negative,negat2,2,1 ,1
negative,negat3,1,2,0
endogen,gene1,30,15,10
endogen,gene2,60,30,20
)
我的问题是 - 什么是执行背景减法的最好方法,即每个 sampleN
列我需要计算背景(我们说它将是来自负
类的所有值的平均值),然后从每个此列的值。目前我使用以下解决方案:
for(nm in names(mydata)[ - c(1:2) ]){
bg< - mydata [molclass =='negative',nm,with = F];
bg< - mean(unlist(bg));
mydata [[nm]] < - (mydata [[nm]] -bg);
}
但我觉得必须有一些更好的方式。
PS我知道有一些包做这些事情,但我的数据对应的计数,而不是信号的强度 - 所以我不能使用 limma
或类似的工具设计用于微阵列。也许有些seq数据包可以帮助,但我不知道,因为我的数据不是从排序。
您不应在 data.table
中使用< -
。你的循环中的最后一个赋值可以用 set
更好。请参阅帮助页面,键入?set
了解详情。
mycols< ; - paste0('sample',1:3)
/ pre>
newcols< - paste0(mycols,'bk')
s< - mydata [['molclass']] negative'
mybkds< - sapply(mycols,function(j)mean(mydata [[j]] [s]))
mydata [,(newcols):= NA]
for(j in mycols)set(mydata,j = paste0(j,'bk'),value = mydata [[j]] - mybkds [j])
我只做了一个循环的最后一步,但这基本上与你的代码(其中一切都在循环中)相同。
* apply
函数和循环只是不同的语法,我听说过,你可以随意选择。I have gene expression data as number of counts for each probe, something like this:
library(data.table) mydata <- fread( "molclass,mol.id,sample1,sample2,sample3 negative, negat1, 0, 1, 2 negative, negat2, 2, 1, 1 negative, negat3, 1, 2, 0 endogen, gene1, 30, 15, 10 endogen, gene2, 60, 30, 20 ")
My question here is - what would be the best way to perform background subtraction, i.e. for each
sampleN
column I need to calculate background (let's say it will be the average of all values fromnegative
class) and then subtract this background from each value of this column. For the moment I am using the following solution:for (nm in names(mydata)[-c(1:2)]) { bg <- mydata[molclass=='negative', nm, with=F]; bg <- mean(unlist(bg)); mydata[[nm]] <- (mydata[[nm]] - bg); }
but I feel there must be some "nicer" way.
P.S. I know that there are some packages that do those things, but my data correspond to the number of counts, not intensity of signal - so I can't use
limma
or similar tools designed for microarrays. Maybe some seq-data packages could help, but I am not sure because my data is not from sequencing either.解决方案Generally, you shouldn't use
<-
with adata.table
. The last assignment in your loop would be better withset
. See the help page by typing?set
for details.mycols <- paste0('sample',1:3) newcols <- paste0(mycols,'bk') s <- mydata[['molclass']] == 'negative' mybkds <- sapply(mycols,function(j) mean(mydata[[j]][s]) ) mydata[,(newcols):=NA] for (j in mycols) set(mydata,j=paste0(j,'bk'),value=mydata[[j]]-mybkds[j])
I've only done the last step in a loop, but this is basically the same as your code (where everything is in the loop).
*apply
functions and loops are just different syntax, I've heard, and you can go with whichever you prefer.这篇关于背景扣除在表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!