MATLAB内存不足,但不应 [英] MATLAB is running out of memory but it should not be
问题描述
我正尝试使用princomp(x)
,它已经标准化.
I'm trying to apply PCA on my data using princomp(x)
, that has been standardized.
数据为<16 x 1036800 double>
.这将运行我们的内存,这是我们所期望的,但事实是这是一台新计算机,该计算机拥有24GB的RAM用于数据挖掘. MATLAB甚至列出了可用于内存检查的24GB.
The data is <16 x 1036800 double>
. This runs our of memory which is too be expected except for the fact that this is a new computer, the computer holds 24GB of RAM for data mining. MATLAB even lists the 24GB available on a memory check.
在执行PCA时MATLAB是否实际上耗尽了内存,还是MATLAB没有充分利用RAM?任何信息或想法都将有所帮助. (我可能需要增加虚拟内存,但假设24GB已足够.)
Is MATLAB actually running out of memory while performing a PCA or is MATLAB not using the RAM to it's full potential? Any information or ideas would be helpful. (I may need to increase the virtual memory but assumed the 24GB would have sufficed.)
推荐答案
对于大小为n-by-p的数据矩阵,PRINCOMP
将返回大小为p-by-p的系数矩阵,其中每一列都是主体组件使用原始尺寸表示,因此在您的情况下,您将创建一个尺寸为输出的矩阵:
For a data matrix of size n-by-p, PRINCOMP
will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:
1036800*1036800*8 bytes ~ 7.8 TB
考虑使用PRINCOMP(X,'econ')
仅返回差异较大的PC
Consider using PRINCOMP(X,'econ')
to return only the PCs with significant variance
或者,考虑执行通过SVD进行PCA :在您的情况下n<<p
,而且协方差矩阵是无法计算的.因此,代替分解p-by-p矩阵XX'
,仅分解较小的n-n-n矩阵X'X
就足够了.请参阅本文以供参考
Alternatively, consider performing PCA by SVD: in your case n<<p
, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX'
, it is sufficient to only decompose the smaller n-by-n matrix X'X
. Refer to this paper for reference.
这是我的实现,此函数的输出与 PRINCOMP 的输出匹配(无论如何,前三个):
Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):
function [PC,Y,varPC] = pca_by_svd(X)
% PCA_BY_SVD
% X data matrix of size n-by-p where n<<p
% PC columns are first n principal components
% Y data projected on those PCs
% varPC variance along the PCs
%
X0 = bsxfun(@minus, X, mean(X,1)); % shift data to zero-mean
[U,S,PC] = svd(X0,'econ'); % SVD decomposition
Y = X0*PC; % project X on PC
varPC = diag(S'*S)' / (size(X,1)-1); % variance explained
end
我刚刚在4GB的计算机上尝试过,并且运行得很好:
I just tried it on my 4GB machine, and it ran just fine:
» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
Name Size Bytes Class Attributes
PC 1036800x16 132710400 double
Y 16x16 2048 double
varPC 1x16 128 double
x 16x1036800 132710400 double
更新:
不推荐使用princomp
函数,而推荐使用 pca
在R2012b中引入,其中包括更多选项.
Update:
The princomp
function became deprecated in favor of pca
introduced in R2012b, which includes many more options.
这篇关于MATLAB内存不足,但不应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!