提高SVM分类器准确性的技术 [英] Techniques to improve the accuracy of SVM classifier

查看:108
本文介绍了提高SVM分类器准确性的技术的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用UCI数据集建立一个分类器来预测乳腺癌.我正在使用支持向量机.尽管我为提高分类器的准确性做出了最真诚的努力,但我无法超越97.062%.我尝试了以下方法:

I am trying to build a classifier to predict breast cancer using the UCI dataset. I am using support vector machines. Despite my most sincere efforts to improve upon the accuracy of the classifier, I cannot get beyond 97.062%. I've tried the following:

1. Finding the most optimal C and gamma using grid search.
2. Finding the most discriminative feature using F-score.

有人可以建议我一些技巧来提高准确性吗?我的目标是至少达到99%.

Can someone suggest me techniques to improve upon the accuracy? I am aiming at at least 99%.

1.Data are already normalized to the ranger of [0,10]. Will normalizing it to [0,1]  help?

2. Some other method to find the best C and gamma?

推荐答案

对于SVM,对所有要素都具有相同的缩放比例很重要,通常,通过缩放每个(列)要素中的值来实现均值是0并且方差为1.另一种方法是缩放它,以使min和max例如为0和1.但是,[0,1]和[0,10]之间没有任何区别.两者将显示相同的性能.

For SVM, it's important to have the same scaling for all features and normally it is done through scaling the values in each (column) feature such that the mean is 0 and variance is 1. Another way is to scale it such that the min and max are for example 0 and 1. However, there isn't any difference between [0, 1] and [0, 10]. Both will show the same performance.

如果您坚持使用SVM进行分类,则可能导致改进的另一种方法是组合多个SVM.如果您使用的是Python,则可以尝试从 sklearn.ensemble 中的 BaggingClassifier .

If you insist on using SVM for classification, another way that may result in improvement is ensembling multiple SVM. In case you are using Python, you can try BaggingClassifier from sklearn.ensemble.

还要注意,您不能指望从一组真实的训练数据中获得任何表现.我认为97%是非常不错的表现.如果超出此范围,则可能会过度拟合数据.

Also notice that you can't expect to get any performance from a real set of training data. I think 97% is a very good performance. It is possible that you overfit the data if you go higher than this.

这篇关于提高SVM分类器准确性的技术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆