使用交叉验证和F1分数选择SVM参数 [英] Selecting SVM parameters using cross validation and F1-scores
问题描述
我需要在调优C&时跟踪F1分数. SVM中的Sigma, 例如,以下代码跟踪准确性,我需要将其更改为F1-Score,但我无法做到这一点…….
I need to keep track of the F1-scores while tuning C & Sigma in SVM, For example the following code keeps track of the Accuracy, I need to change it to F1-Score but I was not able to do that…….
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
我看过以下两个链接
我确实知道我必须首先在训练数据上找到最佳的C和gamma/sigma参数,然后使用这两个值进行一次LEAVE-ONE-OUT交叉验证分类实验, 因此,我现在想要的是首先进行网格搜索以调整C&西格玛 请我更喜欢使用MATLAB-SVM而不是LIBSVM. 下面是我的LEAVE-ONE-OUT交叉验证分类的代码.
I do understand that I have to first find the best C and gamma/sigma parameters over the training data, then use these two values to do a LEAVE-ONE-OUT crossvalidation classification experiment, So what I want now is to first do a grid-search for tuning C & sigma. Please I would prefer to use MATLAB-SVM and not LIBSVM. Below is my code for LEAVE-ONE-OUT crossvalidation classification.
... clc
clear all
close all
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
input=good(:,1:12);
target=good(:,13);
CVO = cvpartition(target,'leaveout',1);
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:CVO.NumTestSets %# for each fold
trIdx = CVO.training(i);
teIdx = CVO.test(i);
%# train an SVM model over training instances
svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
%# test using test instances
pred = svmclassify(svmModel, input(teIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, teIdx);
end
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
aF1 = ((f1P+f1N)/2);
我更改了代码 但我犯了一些错误,却遇到了错误,
i have changed the code but i making some mistakes and i am getting errors,
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
inpt=good(:,1:12);
target=good(:,13);
k=10;
cvFolds = crossvalind('Kfold', target, k); %# get indices of 10-fold CV
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:k
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx;
C = 0.1:0.1:1;
S = 0.1:0.1:1;
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)
for s = 1:numel(S)
vals = crossval(@(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c))),inpt(trainIdx,:),target(trainIdx));
fscores(c,s) = mean(vals);
end
end
end
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
.......
和功能.....
.....
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);
pred = svmclassify(svmModel, XVAL, 'Showplot',false);
cp = classperf(YVAL, pred)
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
fscore = ((f1P+f1N)/2);
end
推荐答案
因此,基本上,您希望采用这一行:
So basically you want to take this line of yours:
svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
将其放入循环中,以更改您的'BoxConstraint'
和'RBF_Sigma'
参数,然后使用 crossval
来输出该迭代参数组合的f1分数.
put it in a loop that varies your 'BoxConstraint'
and 'RBF_Sigma'
parameters and then uses crossval
to output the f1-score for that iterations combination of parameters.
您可以像在libsvm代码示例中一样使用单个for循环(即使用meshgrid
和1:numel()
,这可能更快)或嵌套的for循环.我将使用嵌套循环,以便您同时使用两种方法:
You can use a single for-loop exactly like in your libsvm code example (i.e. using meshgrid
and 1:numel()
, this is probably faster) or a nested for-loop. I'll use a nested loop so that you have both approaches:
C = [0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100, 300] %// you must choose your own set of values for the parameters that you want to test. You can either do it this way by explicitly typing out a list
S = 0:0.1:1 %// or you can do it this way using the : operator
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)
for s = 1:numel(S)
vals = crossval(@(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c)),input(trIdx,:),target(trIdx));
fscores(c,s) = mean(vals);
end
end
%// Then establish the C and S that gave you the bet f-score. Don't forget that c and s are just indexes though!
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
现在我们只需要定义函数fun
.文档中有关于fun
的说法:
Now we just have to define the function fun
. The docs have this to say about fun
:
fun是具有两个输入的函数的函数句柄,即训练 X的XTRAIN子集和X的XTEST的测试子集,如下所示:
fun is a function handle to a function with two inputs, the training subset of X, XTRAIN, and the test subset of X, XTEST, as follows:
testval = fun(XTRAIN,XTEST)每次调用时,都应使用fun XTRAIN拟合模型,然后返回根据计算得出的一些标准testval XTEST使用该拟合模型.
testval = fun(XTRAIN,XTEST) Each time it is called, fun should use XTRAIN to fit a model, then return some criterion testval computed on XTEST using that fitted model.
所以fun
需要:
- 输出单个f分数
- 输入X和Y的训练和测试集.请注意,这些都是您实际训练集的子集!将它们更像是您的训练集的训练和验证子集.另外请注意,crossval会为您拆分这些设置!
- 在训练子集中训练分类器(使用循环中当前的
C
和S
参数) - 在测试(或验证)子集上运行新分类器
- 计算并输出性能指标(在您的情况下,您需要f1分数)
- output a single f-score
- take as input a training and testing set for X and Y. Note that these are both subsets of your actual training set! Think of them more like a training and validation SUBSET of your training set. Also note that crossval will split these sets up for you!
- Train a classifier on the training subset (using your current
C
andS
parameters from your loop) - RUN your new classifier on the test (or validation rather) subset
- Compute and output a performance metric (in your case you want the f1-score)
您会注意到,fun
不能接受任何额外的参数,这就是为什么我将其包装在匿名函数中的原因,以便我们可以将当前的C
和S
值传入.(即所有只是把六个参数" fun
转换"为crossval
所需的四个参数的一个技巧.
You'll notice that fun
can't take any extra parameters which is why I've wrapped it in an anonymous function so that we can pass the current C
and S
values in. (i.e. all that @(...)(fun(...))
stuff above. That's just a trick to "convert" our six parameter fun
into the 4 parameter one required by crossval
.
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);
pred = svmclassify(svmModel, XVAL, 'Showplot',false);
CP = classperf(YVAL, pred)
fscore = ... %// You can do this bit the same way you did earlier
end
这篇关于使用交叉验证和F1分数选择SVM参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!