Matlab主成分分析(特征值顺序) [英] Matlab Principal Component Analysis (eigenvalues order)

查看:554
本文介绍了Matlab主成分分析(特征值顺序)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Matlab的"princomp"函数,但是此函数在排序数组中给出特征值.这样我无法找出哪个列对应哪个特征值. 对于Matlab,

I want to use the "princomp" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can't find out to which column corresponds which eigenvalue. For Matlab,

m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);

m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);

也就是说,交换前两列不会更改任何内容.潜在的结果(特征值)将为:(27,0,0) 信息(哪个特征值对应于哪个原始(输入)列)丢失. 有没有办法告诉Matlab不要对特征值进行排序?

That is, swapping the first two columns does not change anything. The result (eigenvalues) in latent will be: (27,0,0) The information (which eigenvalue corresponds to which original (input) column) is lost. Is there a way to tell matlab to not to sort the eigenvalues?

推荐答案

使用PCA,返回的每个主成分将是原始列/维的线性组合.也许一个例子可以消除您的误会.

With PCA, each principle component returned will be a linear combination of the original columns/dimensions. Perhaps an example might clear up any misunderstanding you have.

让我们考虑由150个实例和4个维度组成的Fisher-Iris数据集,并将PCA应用于数据.为了使事情更容易理解,在调用PCA函数之前,我首先将数据以零为中心:

Lets consider the Fisher-Iris dataset comprising of 150 instances and 4 dimensions, and apply PCA on the data. To make things easier to understand, I am first zero-centering the data before calling PCA function:

load fisheriris
X = bsxfun(@minus, meas, mean(meas));    %# so that mean(X) is the zero vector

[PC score latent] = princomp(X);

让我们看一下第一个返回的主成分(PC矩阵的第一列):

Lets look at the first returned principal component (1st column of PC matrix):

>> PC(:,1)
      0.36139
    -0.084523
      0.85667
      0.35829

这表示为原始尺寸的线性组合,即:

This is expressed as a linear combination of the original dimensions, i.e.:

PC1 =  0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4

因此,要在由主成分组成的新坐标系中表达相同的数据,新的第一维应根据上述公式为原始维的线性组合.

Therefore to express the same data in the new coordinates system formed by the principal components, the new first dimension should be a linear combination of the original ones according to the above formula.

我们可以简单地将其计算为X*PC,这正是在PRINCOMP(score)的第二个输出中返回的确切值,以确认此尝试:

We can compute this simply as X*PC which is the exactly what is returned in the second output of PRINCOMP (score), to confirm this try:

>> all(all( abs(X*PC - score) < 1e-10 ))
    1

最后,每个主成分的重要性可以通过其解释的数据差异多少来确定.这由PRINCOMP(latent)的第三个输出返回.

Finally the importance of each principal component can be determined by how much variance of the data it explains. This is returned by the third output of PRINCOMP (latent).

我们可以自己计算数据的PCA,而无需使用PRINCOMP:

We can compute the PCA of the data ourselves without using PRINCOMP:

[V E] = eig( cov(X) );
[E order] = sort(diag(E), 'descend');
V = V(:,order);

协方差矩阵V的特征向量是主要成分(与上面的PC相同,尽管可以将符号取反),并且相应的特征值E表示所解释的方差量(与).注意,习惯上按其特征值对主成分进行排序.和以前一样,要在新坐标中表示数据,我们只需计算X*V(如果您确保符号匹配,则应与上面的score相同)

the eigenvectors of the covariance matrix V are the principal components (same as PC above, although the sign can be inverted), and the corresponding eigenvalues E represent the amount of variance explained (same as latent). Note that it is customary to sort the principal component by their eigenvalues. And as before, to express the data in the new coordinates, we simply compute X*V (should be the same as score above, if you make sure to match the signs)

这篇关于Matlab主成分分析(特征值顺序)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆