使用带有 libsvm 的预计算内核 [英] using precomputed kernels with libsvm

查看:39
本文介绍了使用带有 libsvm 的预计算内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用不同的图像描述符对图像进行分类.由于他们有自己的指标,我使用的是预先计算的内核.因此,鉴于这些 NxN 内核矩阵(总共 N 个图像),我想训练和测试 SVM.不过,我在使用 SVM 方面并不是很有经验.

让我困惑的是如何输入训练的输入.使用内核 MxM 的子集(M 是训练图像的数量),训练具有 M 个特征的 SVM.但是,如果我理解正确,这会限制我使用具有相似数量特征的测试数据.尝试使用大小为 MxN 的子内核会导致训练过程中出现无限循环,因此在测试时使用更多特征会导致结果不佳.

这导致使用相同大小的训练和测试集给出合理的结果.但是,如果我只想分类,比如说一张图像,或者为每个类使用给定数量的图像进行训练,然后用其余的图像进行测试,这根本行不通.

如何消除训练图像数量和特征之间的依赖关系,以便我可以使用任意数量的图像进行测试?

我在 MATLAB 中使用 libsvm,内核是范围在 [0,1] 之间的距离矩阵.

解决方案

你似乎已经找到问题所在了...根据 MA​​TLAB 包中包含的 README 文件:

<块引用>

要使用预计算内核,您必须包含示例序列号作为训练和测试数据的第一列.

举个例子:

%# 读取数据集[dataClass, data] = libsvmread('./heart_scale');%# 分成训练/测试数据集trainData = 数据(1:150,:);testData = 数据(151:270,:);trainClass = dataClass(1:150,:);testClass = dataClass(151:270,:);numTrain = size(trainData,1);numTest = size(testData,1);%#径向基函数:exp(-gamma*|u-v|^2)西格玛 = 2e-3;rbfKernel = @(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);%# 计算每对 (train,train) 和%# (test,train) 实例并包含样本序列号作为第一列K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];%# 训练和测试模型 = svmtrain(trainClass, K, '-t 4');[predClass, acc, decVals] = svmpredict(testClass, KK, 模型);%#混淆矩阵C = 混淆垫(测试类,predClass)

输出:

<预><代码>*优化完成,#iter = 70nu = 0.933333obj = -117.027620,rho = 0.183062nSV = 140,nBSV = 140总 nSV = 140准确率 = 85.8333% (103/120)(分类)C =65 512 38

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.

What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.

This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.

How can i remove the dependency between number of training images and features, so i can test with any number of images?

I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].

解决方案

You seem to already have figured out the problem... According to the README file included in the MATLAB package:

To use precomputed kernel, you must include sample serial number as the first column of the training and testing data.

Let me illustrate with an example:

%# read dataset
[dataClass, data] = libsvmread('./heart_scale');

%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);

%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = @(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);

%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K =  [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)'  , rbfKernel(testData,trainData)  ];

%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);

%# confusion matrix
C = confusionmat(testClass,predClass)

The output:

*
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)

C =
    65     5
    12    38

这篇关于使用带有 libsvm 的预计算内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆