在 libsvm matlab 中标记一个类以进行交叉验证 [英] Labeling one class for cross validation in libsvm matlab

查看:40
本文介绍了在 libsvm matlab 中标记一个类以进行交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 MATLAB 中使用 LibSVM 进行一类分类.

I want to use one-class classification using LibSVM in MATLAB.

我想训练数据并使用交叉验证,但我不知道如何标记异常值.

I want to train data and use cross validation, but I don't know what I have to do to label the outliers.

例如,如果我有这些数据:

If for example I have this data:

trainData =  [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2];
labelTrainData = [-1 -1 -1 -1 0 0 0 0];  

(前四个是1类的例子,其他四个是异常值的例子,只是为了交叉验证)

(The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation)

我使用这个来训练模型:

And I train the model using this:

model = svmtrain(labelTrainData, trainData , '-s 2 -t 0 -d 3 -g 2.0 -r 2.0 -n 0.5 -m 40.0 -c 0.0 -e 0.0010 -p 0.1 -v 2' );

我不确定使用哪个值来标记 1 类数据以及对异常值使用什么值.有人知道怎么做吗?

I'm not sure which value use to label the 1-class data and what to use to the outliers. Does someone knows how to do this?.

提前致谢.-杰西卡

推荐答案

根据http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "由于缺少类标签一类SVM,无法优化内核使用交叉验证的参数".但是,根据 LIBSVM FAQ,这不是非常正确:

According to http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "Due to the lack of class labels in the one-class SVM, it is not possible to optimize the kernel parameters using cross-validation". However, according to the LIBSVM FAQ that is not quite correct:

问:由于训练数据只属于一类,如何为一类 SVM 选择参数?您已经预先指定了真阳性率,然后搜索实现类似交叉验证准确性的参数.

Q: How do I choose parameters for one-class SVM as training data are in only one class? You have pre-specified true positive rate in mind and then search for parameters which achieve similar cross-validation accuracy.

此外,libsvm 源的 README 说明了输入数据:对于分类,label 是一个表示类标签的整数......对于一类 SVM,它不使用,因此可以是任何数字."

Furthermore the README for the libsvm source says of the input data: "For classification, label is an integer indicating the class label ... For one-class SVM, it's not used so can be any number."

我认为您的异常值不应包含在训练数据中 - libsvm 将忽略训练标签.您要做的是找到一个包含良好数据但不包含异常值的超球面.如果您在数据中使用异常值进行训练,LIBSVM 将尝试找到一个包含异常值的超球面,这正是您不想要的.因此,您需要一个没有异常值的训练数据集、一个用于选择参数的带有异常值的验证数据集,以及一个最终测试数据集,以查看您的模型是否具有泛化能力.

I think your outliers should not be included in the training data - libsvm will ignore the training labels. What you are trying to do is find a hypersphere that contains good data but not outliers. If you train with outliers in the data LIBSVM will try yo find a hypersphere that includes the outliers, which is exactly what you don't want. So you will need a training dataset without outliers, a validation dataset with outliers for choosing parameters, and a final test dataset to see whether your model generalizes.

这篇关于在 libsvm matlab 中标记一个类以进行交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆