迭代包含3000行的大矩阵并计算相关性 [英] Iterating over the big matrix containing 3000 rows and calculate the correlation

查看:71
本文介绍了迭代包含3000行的大矩阵并计算相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图遍历一个矩阵,并计算每两行的相关系数,并打印出相关矩阵.

I am trying to loop over the a matrix and do the correlation coefficient of each two-row and print out the correlation matrix.

ID A B C D E F G H I
Row01 0.08 0.47 0.94 0.33 0.08 0.93 0.72 0.51 0.55
Row02 0.37 0.87 0.72 0.96 0.20 0.55 0.35 0.73 0.44
Row03 0.19 0.71 0.52 0.73 0.03 0.18 0.13 0.13 0.30
Row04 0.08 0.77 0.89 0.12 0.39 0.18 0.74 0.61 0.57
Row05 0.09 0.60 0.73 0.65 0.43 0.21 0.27 0.52 0.60
Row06 0.60 0.54 0.70 0.56 0.49 0.94 0.23 0.80 0.63
Row07 0.02 0.33 0.05 0.90 0.48 0.47 0.51 0.36 0.26
Row08 0.34 0.96 0.37 0.06 0.20 0.14 0.84 0.28 0.47
........
(30000 rows!)

我希望将Pearson相关输出为:

I want the Pearson correlation output as:

 Row01
Row01 1.000
Row02 0.012
Row03 0.023
Row04 0.820
Row05 0.165
Row06 0.230
Row07 0.376
Row08 0.870

输出为Row01.txt

output as Row01.txt

Row02
Row01 0.012
Row02 1.000
Row03 0.023
Row04 0.820
Row05 0.165
Row06 0.230
Row07 0.376
Row08 0.870

输出为Row02.txt. . . .

output as Row02.txt. . . . .

输出文件将为30000!

output files will be 30000!

我知道此算法看起来很愚蠢,matrix<-cor(T(data))可以完成全部操作,并且corr矩阵的一半就足够了,因为corr结果沿对角线对称.

I am aware of this algorithm looks stupid, that matrix<-cor(T(data)) will do the whole thing, and half of the corr matrix is enough as the corr result is symmetric along the diagonal.

但是我的问题是

  1. 我的数据太大,R无法处理30000x30000.
  2. 很难检索特定行与其余行的特定关联.
  3. 使用我的愚蠢算法",我可以轻松地从文件夹中获取自己感兴趣的corr.

推荐答案

感谢Nico! 我更正了一些小错误后,差点就到了.在这里,我附上我的脚本:

Thanks Nico! Almost got there after I corrected small bugs. Here I attach my script:

datamatrix=read.table("ref.txt",sep="\t",header=T,row.names=1)
correl <- NULL
for (i in 1:nrow(datamatrix)) {
  correl <- apply(datamatrix, 1, function(x){cor(t(datamatrix[,i]))})
  write.table(correl, paste(row.names(datamatrix)[i], ".txt", sep=""))
}

但是我担心function(x)部分有问题,似乎是t(datamatrix[i,j]),它将计算任意两行的corr.

But I am afraid the function(x) part is of problem, that seems to be t(datamatrix[i,j]), which will calculate corr of any two rows.

实际上,我需要遍历矩阵. 首先cor(row01, row02)得到rwo01和row02之间的一个相关性;然后cor(row01, row03)得到row01和rwo03的相关性,.....直到row01 row30000之间的相关性.现在我得到

Actually I need to iterate through the matrix. first cor(row01, row02) get one correlation between rwo01 and row02; then cor(row01, row03) to get the correlation of row01 and rwo03, ....and till correlation between row01 row30000.Now I got the first column for

      row01
Row01 **1.000**
Row02 0.012
Row03 0.023
Row04 0.820
Row05 0.165
Row06 0.230
Row07 0.376
Row08 0.870

并将其保存到文件row01.txt;

and save it to file row01.txt;

类似地获得

      Row02
Row01 0.012
Row02 **1.000**
Row03 0.023
Row04 0.820
Row05 0.165
Row06 0.230
Row07 0.376
Row08 0.870

并将其保存到文件row02.txt.

and save it to file row02.txt.

总共我将得到30000个文件.这很愚蠢,但是这样可以跳过内存限制,并且可以很容易地处理特定行的相关性.

Totally I will get 30000 files. It is stupid, but this can skip the memory limit and can be easily handled for the correlation of a specific row.

这篇关于迭代包含3000行的大矩阵并计算相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆