帮助-使用LibSVM可以达到-100%的准确性吗? [英] Help--100% accuracy with LibSVM?

查看:91
本文介绍了帮助-使用LibSVM可以达到-100%的准确性吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

名义上这是一个好问题,但我可以肯定这是因为发生了一些有趣的事情……

Nominally a good problem to have, but I'm pretty sure it is because something funny is going on...

作为背景,我正在研究面部表情/识别空间中的一个问题,因此要获得100%的准确性似乎是难以置信的(不是在大多数应用中都可以做到这一点……).我猜测数据集中存在一些一致的偏见,这使得SVM过于容易得出答案,即=或=,更有可能的是,我在SVM方面做错了事.

As context, I'm working on a problem in the facial expression/recognition space, so getting 100% accuracy seems incredibly implausible (not that it would be plausible in most applications...). I'm guessing there is either some consistent bias in the data set that it making it overly easy for an SVM to pull out the answer, =or=, more likely, I've done something wrong on the SVM side.

我正在寻找建议以帮助理解发生的事情-是我吗(=我对LibSVM的使用)?还是数据?

I'm looking for suggestions to help understand what is going on--is it me (=my usage of LibSVM)? Or is it the data?

详细信息:

  • 大约2500个带标签的数据向量/实例(个人的转换视频帧-总共少于20个人),二进制分类问题.约900个功能/实例.不平衡数据集的比例约为1:4.
  • 运行ransub.py,将数据分为测试(500个实例)并进行训练(剩余).
  • 运行"svm-train -t 0". (注意:显然不需要'-w1 1 -w-1 4'...)
  • 在测试文件上运行svm-predict.准确度= 100%!

尝试过的事情

  • 经过大约10次检查,我没有训练&通过一些无意的命令行参数错误对相同的数据文件进行测试
  • 多次重新运行subset.py(即使使用-s 1也是如此),并且只训练/测试了多个不同的数据集(以防我随机选择了最神奇的火车/测试pa
  • 运行一个简单的类似diff的检查,以确认测试文件不是训练数据的子集
  • 数据上的
  • svm-scale对准确性没有影响(准确性= 100%). (尽管支持向量的数量确实从nSV = 127,bSV = 64下降到nBSV = 72,bSV = 0.)
  • ((weird))使用默认的RBF内核(线性线性-即删除'-t 0')导致准确度达到垃圾回收率(?!)
  • 使用对缩放后的数据集训练的模型对未缩放后的数据集运行svm-predict的
  • (健全性检查),结果的准确度= 80%(即,它总是猜测主导类).严格来说,这是一项健全性检查,以确保svm-predict在名义上正确地在我的计算机上起作用.
  • Checked about 10 times over that I'm not training & testing on the same data files, through some inadvertent command-line argument error
  • re-ran subset.py (even with -s 1) multiple times and did train/test only multiple different data sets (in case I randomly upon the most magical train/test pa
  • ran a simple diff-like check to confirm that the test file is not a subset of the training data
  • svm-scale on the data has no effect on accuracy (accuracy=100%). (Although the number of support vectors does drop from nSV=127, bSV=64 to nBSV=72, bSV=0.)
  • ((weird)) using the default RBF kernel (vice linear -- i.e., removing '-t 0') results in accuracy going to garbage(?!)
  • (sanity check) running svm-predict using a model trained on a scaled data set against an unscaled data set results in accuracy = 80% (i.e., it always guesses the dominant class). This is strictly a sanity check to make sure that somehow svm-predict is nominally acting right on my machine.

初步结论?

与数据有关的东西被扑朔迷离-某种程度上,在数据集中,SVM产生了一种微妙的,由实验者驱动的效果.

Something with the data is wacked--somehow, within the data set, there is a subtle, experimenter-driven effect that the SVM is picking up on.

(但是,这并没有首先解释为什么RBF内核会给出垃圾结果.)

(This doesn't, on first pass, explain why the RBF kernel gives garbage results, however.)

将非常感谢有关以下方面的任何建议:a)如何解决我对LibSVM的使用(如果这确实是问题)或b)确定LibSVM数据中哪些细微的实验偏见.

Would greatly appreciate any suggestions on a) how to fix my usage of LibSVM (if that is actually the problem) or b) determine what subtle experimenter-bias in the data LibSVM is picking up on.

推荐答案

另外两个想法:

确保您没有对相同的数据进行训练和测试.这听起来有些愚蠢,但是在计算机视觉应用程序中,您应该注意:确保您不重复数据(例如同一视频的两帧落在不同的折叠上),没有在同一个人上进行训练和测试等等.它比听起来要微妙.

Make sure you're not training and testing on the same data. This sounds kind of dumb, but in computer vision applications you should take care that: make sure you're not repeating data (say two frames of the same video fall on different folds), you're not training and testing on the same individual, etc. It is more subtle than it sounds.

确保您搜索RBF内核的gamma和C参数.有良好的理论(渐近)结果证明线性分类器只是退化的RBF分类器是合理的.因此,您应该只寻找一个好的(C,γ)对.

Make sure you search for gamma and C parameters for the RBF kernel. There are good theoretical (asymptotic) results that justify that a linear classifier is just a degenerate RBF classifier. So you should just look for a good (C, gamma) pair.

这篇关于帮助-使用LibSVM可以达到-100%的准确性吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆