MATLAB内存不足,但不应 [英] MATLAB is running out of memory but it should not be

查看:446
本文介绍了MATLAB内存不足,但不应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用 PCA ://www.mathworks.com/help/stats/princomp.html"rel =" nofollow noreferrer> princomp(x) ,它已经标准化.

I'm trying to apply PCA on my data using princomp(x), that has been standardized.

数据为<16 x 1036800 double>.这将运行我们的内存,这是我们所期望的,但事实是这是一台新计算机,该计算机拥有24GB的RAM用于数据挖掘. MATLAB甚至列出了可用于内存检查的24GB.

The data is <16 x 1036800 double>. This runs our of memory which is too be expected except for the fact that this is a new computer, the computer holds 24GB of RAM for data mining. MATLAB even lists the 24GB available on a memory check.

在执行PCA时MATLAB是否实际上耗尽了内存,还是MATLAB没有充分利用RAM?任何信息或想法都将有所帮助. (我可能需要增加虚拟内存,但假设24GB已足够.)

Is MATLAB actually running out of memory while performing a PCA or is MATLAB not using the RAM to it's full potential? Any information or ideas would be helpful. (I may need to increase the virtual memory but assumed the 24GB would have sufficed.)

推荐答案

对于大小为n-by-p的数据矩阵,PRINCOMP将返回大小为p-by-p的系数矩阵,其中每一列都是主体组件使用原始尺寸表示,因此在您的情况下,您将创建一个尺寸为输出的矩阵:

For a data matrix of size n-by-p, PRINCOMP will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:

1036800*1036800*8 bytes ~ 7.8 TB

考虑使用PRINCOMP(X,'econ')仅返回差异较大的PC

Consider using PRINCOMP(X,'econ') to return only the PCs with significant variance

或者,考虑执行通过SVD进行PCA :在您的情况下n<<p,而且协方差矩阵是无法计算的.因此,代替分解p-by-p矩阵XX',仅分解较小的n-n-n矩阵X'X就足够了.请参阅本文以供参考

Alternatively, consider performing PCA by SVD: in your case n<<p, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX', it is sufficient to only decompose the smaller n-by-n matrix X'X. Refer to this paper for reference.

这是我的实现,此函数的输出与 PRINCOMP 的输出匹配(无论如何,前三个):

Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):

function [PC,Y,varPC] = pca_by_svd(X)
    % PCA_BY_SVD
    %   X      data matrix of size n-by-p where n<<p
    %   PC     columns are first n principal components
    %   Y      data projected on those PCs
    %   varPC  variance along the PCs
    %

    X0 = bsxfun(@minus, X, mean(X,1));     % shift data to zero-mean
    [U,S,PC] = svd(X0,'econ');             % SVD decomposition
    Y = X0*PC;                             % project X on PC
    varPC = diag(S'*S)' / (size(X,1)-1);   % variance explained
end

我刚刚在4GB的计算机上尝试过,并且运行得很好:

I just tried it on my 4GB machine, and it ran just fine:

» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
  Name             Size                     Bytes  Class     Attributes

  PC         1036800x16                 132710400  double              
  Y               16x16                      2048  double              
  varPC            1x16                       128  double              
  x               16x1036800            132710400  double              


更新:

不推荐使用princomp函数,而推荐使用 pca 在R2012b中引入,其中包括更多选项.


Update:

The princomp function became deprecated in favor of pca introduced in R2012b, which includes many more options.

这篇关于MATLAB内存不足,但不应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆