在libsvm Matlab中为交叉验证标记一个类 [英] Labeling one class for cross validation in libsvm matlab

查看:182
本文介绍了在libsvm Matlab中为交叉验证标记一个类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在MATLAB中使用LibSVM使用一类分类.

I want to use one-class classification using LibSVM in MATLAB.

我想训练数据并使用交叉验证,但是我不知道该如何标记异常值.

I want to train data and use cross validation, but I don't know what I have to do to label the outliers.

例如,如果我有此数据:

If for example I have this data:

trainData =  [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2];
labelTrainData = [-1 -1 -1 -1 0 0 0 0];  

(前四个是1类的示例,其他四个是离群值的示例,仅用于交叉验证)

(The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation)

然后我用这个训练模型:

And I train the model using this:

model = svmtrain(labelTrainData, trainData , '-s 2 -t 0 -d 3 -g 2.0 -r 2.0 -n 0.5 -m 40.0 -c 0.0 -e 0.0010 -p 0.1 -v 2' );

我不确定使用哪个值标记一类数据以及将哪些用于异常值.有人知道该怎么做吗?.

I'm not sure which value use to label the 1-class data and what to use to the outliers. Does someone knows how to do this?.

先谢谢了. -杰西卡(Jessica)

Thanks in advance. -Jessica

推荐答案

根据 LIBSVM常见问题解答,这不是完全正确:

According to http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "Due to the lack of class labels in the one-class SVM, it is not possible to optimize the kernel parameters using cross-validation". However, according to the LIBSVM FAQ that is not quite correct:

问:由于训练数据仅在一类中,我如何为一类SVM选择参数? 您已经预先确定了真实的阳性率,然后搜索达到相似的交叉验证准确性的参数.

Q: How do I choose parameters for one-class SVM as training data are in only one class? You have pre-specified true positive rate in mind and then search for parameters which achieve similar cross-validation accuracy.

此外,libsvm源的自述文件还介绍了输入数据: 对于分类, label 是指示类标签的整数.对于一类SVM,它不使用,因此可以是任何数字."

Furthermore the README for the libsvm source says of the input data: "For classification, label is an integer indicating the class label ... For one-class SVM, it's not used so can be any number."

我认为您的离群值不应包含在训练数据中-libsvm将忽略训练标签.您要尝试做的是找到一个包含良好数据但不包含异常值的超球.如果您使用数据中的异常值进行训练,LIBSVM会尝试找到包含异常值的超球,这正是您所不想要的.因此,您将需要一个没有异常值的训练数据集,一个带有用于选择参数的异常值的验证数据集以及一个最终测试数据集,以查看模型是否可以概括.

I think your outliers should not be included in the training data - libsvm will ignore the training labels. What you are trying to do is find a hypersphere that contains good data but not outliers. If you train with outliers in the data LIBSVM will try yo find a hypersphere that includes the outliers, which is exactly what you don't want. So you will need a training dataset without outliers, a validation dataset with outliers for choosing parameters, and a final test dataset to see whether your model generalizes.

这篇关于在libsvm Matlab中为交叉验证标记一个类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆