提高SVM分类器准确性的技术 [英] Techniques to improve the accuracy of SVM classifier
问题描述
我正在尝试使用UCI数据集建立一个分类器来预测乳腺癌.我正在使用支持向量机.尽管我为提高分类器的准确性做出了最真诚的努力,但我无法超越97.062%.我尝试了以下方法:
I am trying to build a classifier to predict breast cancer using the UCI dataset. I am using support vector machines. Despite my most sincere efforts to improve upon the accuracy of the classifier, I cannot get beyond 97.062%. I've tried the following:
1. Finding the most optimal C and gamma using grid search.
2. Finding the most discriminative feature using F-score.
有人可以建议我一些技巧来提高准确性吗?我的目标是至少达到99%.
Can someone suggest me techniques to improve upon the accuracy? I am aiming at at least 99%.
1.Data are already normalized to the ranger of [0,10]. Will normalizing it to [0,1] help?
2. Some other method to find the best C and gamma?
推荐答案
对于SVM,对所有要素都具有相同的缩放比例很重要,通常,通过缩放每个(列)要素中的值来实现均值是0并且方差为1.另一种方法是缩放它,以使min和max例如为0和1.但是,[0,1]和[0,10]之间没有任何区别.两者将显示相同的性能.
For SVM, it's important to have the same scaling for all features and normally it is done through scaling the values in each (column) feature such that the mean is 0 and variance is 1. Another way is to scale it such that the min and max are for example 0 and 1. However, there isn't any difference between [0, 1] and [0, 10]. Both will show the same performance.
如果您坚持使用SVM进行分类,则可能导致改进的另一种方法是组合多个SVM.如果您使用的是Python,则可以尝试从 sklearn.ensemble
中的 BaggingClassifier
.
If you insist on using SVM for classification, another way that may result in improvement is ensembling multiple SVM. In case you are using Python, you can try BaggingClassifier
from sklearn.ensemble
.
还要注意,您不能指望从一组真实的训练数据中获得任何表现.我认为97%是非常不错的表现.如果超出此范围,则可能会过度拟合数据.
Also notice that you can't expect to get any performance from a real set of training data. I think 97% is a very good performance. It is possible that you overfit the data if you go higher than this.
这篇关于提高SVM分类器准确性的技术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!