在MATLAB中计算列相关性的快速方法是什么 [英] What is a fast way to compute column by column correlation in matlab
问题描述
我有两个非常大的矩阵(60x25000),我只想计算两个矩阵之间的列之间的相关性.例如:
I have two very large matrices (60x25000) and I'd like to compute the correlation between the columns only between the two matrices. For example:
corrVal(1) = corr(mat1(:,1), mat2(:,1);
corrVal(2) = corr(mat1(:,2), mat2(:,2);
...
corrVal(i) = corr(mat1(:,i), mat2(:,i);
对于较小的矩阵,我可以简单地使用:
For smaller matrices I can simply use:
colCorr = diag( corr( mat1, mat2 ) );
但是由于内存不足,这不适用于非常大的矩阵.我曾考虑过对矩阵进行切片以计算相关性,然后将结果进行组合,但是计算我实际上并不感兴趣的列组合之间的相关性似乎是一种浪费.
but this doesn't work for very large matrices as I run out of memory. I've considered slicing up the matrices to compute the correlations and then combining the results but it seems like a waste to compute correlation between column combinations that I'm not actually interested.
有没有一种快速方法可以直接计算出我感兴趣的内容?
Is there a quick way to directly compute what I'm interested?
编辑:我过去使用过循环,但是它只是减慢速度的一种方式:
Edit: I've used a loop in the past but its just way to slow:
mat1 = rand(60,5000);
mat2 = rand(60,5000);
nCol = size(mat1,2);
corrVal = zeros(nCol,1);
tic;
for i = 1:nCol
corrVal(i) = corr(mat1(:,i), mat2(:,i));
end
toc;
这大约需要1秒
tic;
corrVal = diag(corr(mat1,mat2));
toc;
这大约需要0.2秒
推荐答案
我可以通过手工计算来获得 x100 速度改进.
I can obtain a x100 speed improvement by computing it by hand.
An=bsxfun(@minus,A,mean(A,1)); %%% zero-mean
Bn=bsxfun(@minus,B,mean(B,1)); %%% zero-mean
An=bsxfun(@times,An,1./sqrt(sum(An.^2,1))); %% L2-normalization
Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1))); %% L2-normalization
C=sum(An.*Bn,1); %% correlation
您可以使用该代码进行比较:
You can compare using that code:
A=rand(60,25000);
B=rand(60,25000);
tic;
C=zeros(1,size(A,2));
for i = 1:size(A,2)
C(i)=corr(A(:,i), B(:,i));
end
toc;
tic
An=bsxfun(@minus,A,mean(A,1));
Bn=bsxfun(@minus,B,mean(B,1));
An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));
Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1)));
C2=sum(An.*Bn,1);
toc
mean(abs(C-C2)) %% difference between methods
以下是计算时间:
Elapsed time is 10.822766 seconds.
Elapsed time is 0.119731 seconds.
两个结果之间的差异很小:
The difference between the two results is very small:
mean(abs(C-C2))
ans =
3.0968e-17
说明
bsxfun
执行逐列操作(或逐行进行输入,具体取决于输入).
bsxfun
does a column-by-column operation (or row-by-row depending on the input).
An=bsxfun(@minus,A,mean(A,1));
此行将删除(c1)每列(c2)的平均值到A
的每一列...因此,基本上,它使A
的列为零均值.
This line will remove (@minus
) the mean of each column (mean(A,1)
) to each column of A
... So basically it makes the columns of A
zero-mean.
An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));
此行将每列乘以其范数的倒数(@times).因此,它们使L-2归一化.
This line multiply (@times) each column by the inverse of its norm. So it makes them L-2 normalized.
一旦列为零均值并进行L2归一化,就可以计算相关性,您只需要使An
的每一列与B
的每一列的点积即可.因此,您将它们逐元素地An.*Bn
相乘,然后对每一列求和:sum(An.*Bn);
.
Once the columns are zero-mean and L2-normalized, to compute the correlation, you just have to make the dot product of each column of An
with each column of B
. So you multiply them element-wise An.*Bn
, and then you sum each column: sum(An.*Bn);
.
这篇关于在MATLAB中计算列相关性的快速方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!