PCA降维用于分类 [英] PCA Dimension reducion for classification

查看:1063
本文介绍了PCA降维用于分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对从CNN不同层提取的特征使用主成分分析.我已经从此处下载了尺寸缩减工具箱.

I am using Principle Component Analysis on the features extracted from different layers of CNN. I have downloaded the toolbox of dimension reduction from here.

我总共有11232个训练图像,每个图像的特征是6532.所以特征矩阵就像11232x6532 如果我想要90%的顶级功能,那么我可以轻松做到这一点,并且使用缩减数据的SVM进行训练的准确性为81.73%,这是公平的. 但是,当我尝试具有2408张图像的测试数据并且每个图像的特征为6532时,因此用于测试数据的特征矩阵为2408x6532.在这种情况下,前90%功能的输出不正确,它显示2408x2408. 测试精度为25%. 不使用降维方法,训练精度为82.17%,测试精度为79%.
更新: 其中X是数据,而no_dims是必需的输出尺寸数. 该PCA函数的输出是变量mappedX和结构mapping.

I have a total of 11232 training images and feature for each image is 6532. so the feature matrix is like that 11232x6532 If I want top 90% features I can easily do that and training accuracy using SVM of reduced data is 81.73% which is fair. However, when I try the testing data which have 2408 images and features of each image is 6532. so feature matrix for testing data is 2408x6532. In that case the output for top 90% feature is not correct it shows 2408x2408. and the testing accuracy is 25%. Without using dimension reduction the training accuracy is 82.17% and testing accuracy is 79%.
Update: Where X is the data and no_dims is required number of dimensions at output. the output of this PCA function is variable mappedX and structure mapping.

% Make sure data is zero mean
    mapping.mean = mean(X, 1);
    X = bsxfun(@minus, X, mapping.mean);

    % Compute covariance matrix
    if size(X, 2) < size(X, 1)
        C = cov(X);
    else
        C = (1 / size(X, 1)) * (X * X');        % if N>D, we better use this matrix for the eigendecomposition
    end

    % Perform eigendecomposition of C
    C(isnan(C)) = 0;
    C(isinf(C)) = 0;
    [M, lambda] = eig(C);

    % Sort eigenvectors in descending order
    [lambda, ind] = sort(diag(lambda), 'descend');
    if no_dims < 1
        no_dims = find(cumsum(lambda ./ sum(lambda)) >= no_dims, 1, 'first');
        disp(['Embedding into ' num2str(no_dims) ' dimensions.']);
    end
    if no_dims > size(M, 2)
        no_dims = size(M, 2);
        warning(['Target dimensionality reduced to ' num2str(no_dims) '.']);
    end
    M = M(:,ind(1:no_dims));
    lambda = lambda(1:no_dims);

    % Apply mapping on the data
    if ~(size(X, 2) < size(X, 1))
        M = bsxfun(@times, X' * M, (1 ./ sqrt(size(X, 1) .* lambda))');     % normalize in order to get eigenvectors of covariance matrix
    end
    mappedX = X * M;

    % Store information for out-of-sample extension
    mapping.M = M;
    mapping.lambda = lambda;

根据您的建议.我已经计算出训练数据的向量.

Based on your suggestion. I have calculated the vector for the training data.

numberOfDimensions = round(0.9*size(Feature,2));
[mapped_data, mapping] = compute_mapping(Feature, 'PCA', numberOfDimensions);

然后使用相同的向量测试数据:

Then using same vector for testing data:

mappedX_test = Feature_test * mapping.M;

准确度仍然是32%

通过减法解决:

Y = bsxfun(@minus, Feature_test, mapping.mean);
mappedX_test = Y * mapping.M;

推荐答案

似乎您正在分别对训练和测试数据进行降维.在训练过程中,您应该记住训练过程中示例的主要得分或基础向量.请记住,您正在根据训练数据使用一组新的正交轴找到数据的新表示形式.在测试期间,由于要代表这些基础向量的数据,因此重复与训练数据完全相同的过程.因此,您可以使用训练数据的基向量来减少数据量.您只得到一个2408 x 2408矩阵,因为您正在测试示例上执行PCA,因为不可能生成超出所讨论矩阵等级(即2408)的基向量.

It looks like you're doing dimensionality reduction on both the training and testing data separately. During training, you're supposed to remember the principal scores or basis vectors of the examples during training. Remember that you are finding a new representation of your data with a new set of orthogonal axes based on the training data. During testing, you repeat the exact same procedure as you did with the training data as you are representing the data with respect to these basis vectors. Therefore, you use the basis vectors for the training data to reduce your data down. You are only getting a 2408 x 2408 matrix because you are performing PCA on the test examples as it is impossible to produce basis vectors beyond the rank of the matrix in question (i.e. 2408).

保留训练阶段的基础向量,并且在测试阶段进行分类时,必须在训练阶段使用相同的基础向量.请记住,在PCA中,必须在降维之前执行均值减法来使数据居中.为此,在您的代码中,我们注意到基本向量存储在mapping.M中,而相关的均值向量存储在mapping.mean中.进入测试阶段时,请确保您的意思是从训练阶段中用mapping.mean减去测试数据:

Retain your basis vectors from the training stage and when it's time to perform classification in the testing stage, you must use the same basis vectors from the training stage. Remember that in PCA, you must centre your data by performing mean subtraction prior to the dimensionality reduction. To do this, in your code we note that the basis vectors are stored in mapping.M and the associated mean vector is stored in mapping.mean. When it comes to the testing stage, make sure you mean subtract your test data with the mapping.mean from the training stage:

Y = bsxfun(@minus, Feature_test, mapping.mean);

一旦有了这个,最后继续进行,维度将减少您的数据:

Once you have this, finally go ahead and dimensionality reduce your data:

mappedX_test = Y * mapping.M;

这篇关于PCA降维用于分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆