一对多 SVM 中的 10 折交叉验证(使用 LibSVM) [英] 10 fold cross-validation in one-against-all SVM (using LibSVM)

查看:60
本文介绍了一对多 SVM 中的 10 折交叉验证(使用 LibSVM)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的一对一中进行 10 倍交叉验证 支持向量机 MATLAB 中的分类.

I want to do a 10-fold cross-validation in my one-against-all support vector machine classification in MATLAB.

我试图以某种方式混合这两个相关的答案:

I tried to somehow mix these two related answers:

但由于我是 MATLAB 及其语法的新手,所以直到现在我才设法让它工作.

But as I'm new to MATLAB and its syntax, I didn't manage to make it work till now.

另一方面,我在LibSVM README 文件,我在那里找不到任何相关示例:

On the other hand, I saw just the following few lines about cross validation in the LibSVM README files and I couldn't find any related example there:

option -v 将数据随机分成n份并计算cross验证准确性/均方误差.

option -v randomly splits the data into n parts and calculates cross validation accuracy/mean squared error on them.

有关输出的含义,请参阅 libsvm 常见问题解答.

See libsvm FAQ for the meaning of outputs.

谁能给我一个 10 折交叉验证和一对一分类的例子?

Could anyone provide me an example of 10-fold cross-validation and one-against-all classification?

推荐答案

我们这样做主要有两个原因 交叉验证:

Mainly there are two reasons we do cross-validation:

  • 作为一种测试方法,它为我们提供了对模型泛化能力的几乎无偏估计(通过避免过度拟合)
  • 作为模型选择的一种方式(例如:找到最好的Cgamma 参数在训练数据上,示例参见这篇文章)
  • as a testing method which gives us a nearly unbiased estimate of the generalization power of our model (by avoiding overfitting)
  • as a way of model selection (eg: find the best C and gamma parameters over the training data, see this post for an example)

对于我们感兴趣的第一种情况,该过程涉及为每个折叠训练 k 个模型,然后在整个训练集上训练一个最终模型.我们报告了 k 折的平均准确率.

For the first case which we are interested in, the process involves training k models for each fold, and then training one final model over the entire training set. We report the average accuracy over the k-folds.

现在由于我们使用一对多的方法来处理多类问题,每个模型由 N 个支持向量机组成(每个类一个).

Now since we are using one-vs-all approach to handle the multi-class problem, each model consists of N support vector machines (one for each class).

以下是实现一对多方法的包装函数:

The following are wrapper functions implementing the one-vs-all approach:

function mdl = libsvmtrain_ova(y, X, opts)
    if nargin < 3, opts = ''; end

    %# classes
    labels = unique(y);
    numLabels = numel(labels);

    %# train one-against-all models
    models = cell(numLabels,1);
    for k=1:numLabels
        models{k} = libsvmtrain(double(y==labels(k)), X, strcat(opts,' -b 1 -q'));
    end
    mdl = struct('models',{models}, 'labels',labels);
end

function [pred,acc,prob] = libsvmpredict_ova(y, X, mdl)
    %# classes
    labels = mdl.labels;
    numLabels = numel(labels);

    %# get probability estimates of test instances using each 1-vs-all model
    prob = zeros(size(X,1), numLabels);
    for k=1:numLabels
        [~,~,p] = libsvmpredict(double(y==labels(k)), X, mdl.models{k}, '-b 1 -q');
        prob(:,k) = p(:, mdl.models{k}.Label==1);
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob, [], 2);
    %# compute classification accuracy
    acc = mean(pred == y);
end

这里是支持交叉验证的函数:

And here are functions to support cross-validation:

function acc = libsvmcrossval_ova(y, X, opts, nfold, indices)
    if nargin < 3, opts = ''; end
    if nargin < 4, nfold = 10; end
    if nargin < 5, indices = crossvalidation(y, nfold); end

    %# N-fold cross-validation testing
    acc = zeros(nfold,1);
    for i=1:nfold
        testIdx = (indices == i); trainIdx = ~testIdx;
        mdl = libsvmtrain_ova(y(trainIdx), X(trainIdx,:), opts);
        [~,acc(i)] = libsvmpredict_ova(y(testIdx), X(testIdx,:), mdl);
    end
    acc = mean(acc);    %# average accuracy
end

function indices = crossvalidation(y, nfold)
    %# stratified n-fold cros-validation
    %#indices = crossvalind('Kfold', y, nfold);  %# Bioinformatics toolbox
    cv = cvpartition(y, 'kfold',nfold);          %# Statistics toolbox
    indices = zeros(size(y));
    for i=1:nfold
        indices(cv.test(i)) = i;
    end
end

最后,这里是一个简单的演示来说明用法:

Finally, here is simple demo to illustrate the usage:

%# laod dataset
S = load('fisheriris');
data = zscore(S.meas);
labels = grp2idx(S.species);

%# cross-validate using one-vs-all approach
opts = '-s 0 -t 2 -c 1 -g 0.25';    %# libsvm training options
nfold = 10;
acc = libsvmcrossval_ova(labels, data, opts, nfold);
fprintf('Cross Validation Accuracy = %.4f%%
', 100*mean(acc));

%# compute final model over the entire dataset
mdl = libsvmtrain_ova(labels, data, opts);

将其与 libsvm 默认使用的一对一方法进行比较:

Compare that against the one-vs-one approach which is used by default by libsvm:

acc = libsvmtrain(labels, data, sprintf('%s -v %d -q',opts,nfold));
model = libsvmtrain(labels, data, strcat(opts,' -q'));

这篇关于一对多 SVM 中的 10 折交叉验证(使用 LibSVM)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆