基于一列的标准化R函数? [英] R function for normalization based on one column?

查看:48
本文介绍了基于一列的标准化R函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以根据最后一列(样本)样本 = 已测序基因组的数量在 R 中标准化此表.所以我想得到所有条件下所有基因的归一化分布.

Is it possible to normalize this table in R based on the last column(samples) samples = number of sequenced genomes. So I want to get a normalised distribution of all the genes in all the conditions.

我的数据的简化示例:

我试过了:

dat1 <- read.table(text = " gene1   gene2   gene3   samples 
condition1  1   1   8   120
condition2  18  4   1   118
condition3  0   0   1   75
condition4  32  1   1   130", header = TRUE)

dat1<-normalize(dat1, method = "standardize", range = c(0, 1), margin = 1L, on.constant = "quiet")

但结果包括负值,我不确定这种方法有多大用处.任何人都可以建议我应该如何规范我的数据......以获得有意义的结果.

But the results include negative values and I am not sure how useful this approach is. Can anyone please suggest how I should normalize my data ... to get meaningful results.

非常感谢,如果这是一个愚蠢的问题,我们深表歉意.

Thanks a lot and apologies if it is a dumb question.

推荐答案

使用你的数据,你先写一个 min max 函数:

Using your data, you write a min max function first:

minmax = function(x){ (x-min(x))/(max(x)-min(x))}

然后遍历列:

norm = data.frame(lapply(dat1[,1:3],function(i) minmax(i/dat1$samples)))

它看起来像这样,我希望它是正确的:

And it looks like this, I hope it's correct:

       gene1     gene2      gene3
1 0.03385417 0.2458333 1.00000000
2 0.61970339 1.0000000 0.01326455
3 0.00000000 0.0000000 0.09565217
4 1.00000000 0.2269231 0.00000000

这篇关于基于一列的标准化R函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆