使用 libsvm 进行交叉验证后重新训练 [英] Retraining after Cross Validation with libsvm

查看:31
本文介绍了使用 libsvm 进行交叉验证后重新训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道交叉验证用于选择好的参数.找到它们后,我需要在没有 -v 选项的情况下重新训练整个数据.

I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option.

但我面临的问题是,在我使用 -v 选项进行训练后,我获得了交叉验证准确率(例如 85%).没有模型,我看不到 C 和 gamma 的值.在这种情况下,我该如何重新训练?

But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain?

顺便说一句,我应用了 10 折交叉验证.例如

Btw i applying 10 fold cross validation. e.g

optimization finished, #iter = 138
nu = 0.612233
obj = -90.291046, rho = -0.367013
nSV = 165, nBSV = 128
Total nSV = 165
Cross Validation Accuracy = 98.1273%

需要一些帮助..

为了获得最佳的 C 和 gamma,我使用了 LIBSVM 常见问题解答中提供的这段代码

To get the best C and gamma, i use this code that is available in the LIBSVM FAQ

bestcv = 0;
for log2c = -6:10,
  for log2g = -6:3,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('(best c=%g, g=%g, rate=%g)
',bestc, bestg, bestcv);
  end
end

另一个问题:使用 -v 选项后的交叉验证准确性是否与我们在不使用 -v 选项的情况下训练并使用该模型进行预测时获得的准确性相似?两者的准确率相似吗?

Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar?

另一个问题:交叉验证基本上通过避免过拟合来提高模型的准确性.因此,它需要有一个模型,然后才能改进.我对吗?除此之外,如果我有不同的模型,那么交叉验证的准确性会不同吗?我说得对吗?

Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right?

再问一个问题:在交叉验证的准确率中,C和gamma的值是多少?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

图形是这样的

那么 C 的值为 2 且 gamma = 0.0078125.但是当我用新参数重新训练模型时.该值与 99.63% 不同.可能有什么原因吗?提前致谢...

Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance...

推荐答案

这里的 -v 选项实际上是为了避免过度拟合问题(而不是使用整个数据对于训练,对 N-1 折叠进行 N 折交叉验证训练,并对剩余的折叠进行测试,一次一个,然后报告平均准确率).因此,它只返回作为标量而不是实际 SVM 模型的交叉验证准确度(假设您有分类问题,否则为回归的均方误差).

The -v option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on N-1 folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model.

如果要执行模型选择,则必须使用交叉验证(类似于 grid.py 帮助程序 python 脚本)实现网格搜索,以找到Cgamma 的最佳值.

If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the grid.py helper python script), to find the best values of C and gamma.

这应该不难实现:使用 MESHGRID 创建一个值网格,迭代所有对 (C,gamma) 用 5 折交叉验证训练 SVM 模型,然后选择具有最佳 CV 准确度的值...

This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs (C,gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy...

示例:

%# read some training data
[labels,data] = libsvmread('./heart_scale');

%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(labels, data, ...
                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
    'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(gamma)'), title('Cross-Validation Accuracy')

%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...

这篇关于使用 libsvm 进行交叉验证后重新训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆