在scikit-learn SVM中缩放数据 [英] Scaling data in scikit-learn SVM

查看:68
本文介绍了在scikit-learn SVM中缩放数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管libsvm提供了用于缩放数据的工具,但是使用Scikit-Learn(对于SVC分类器,它应基于libSVM)我找不到缩放数据的方法.

While libsvm provides tools for scaling data, with Scikit-Learn (which should be based upon libSVM for the SVC classifier) I find no way to scale my data.

基本上,我想使用4个功能,其中3个范围从0到1,最后一个是大"高度可变的数字.

Basically I want to use 4 features, of which 3 range from 0 to 1 and the last one is a "big" highly variable number.

如果我在libSVM中包含第四个功能(使用easy.py脚本自动缩放数据),我会得到一些非常不错的结果(准确性为96%). 如果在Scikit-Learn中包含第四个变量,则准确性下降到〜78%-但是如果我排除它,则排除该功能时,我得到的结果与libSVM中的结果相同.因此,我很确定这是缺少缩放比例的问题.

If I include the fourth feature in libSVM (using the easy.py script which scales my data automatically) I get some very nice results (96% accuracy). If I include the fourth variable in Scikit-Learn the accuracy drops to ~78% - but if I exclude it, I get the same results I get in libSVM when excluding that feature. Therefore I am pretty sure it's a problem of missing scaling.

如何以编程方式(即不调用svm-scale的方式)复制SVM的缩放过程?

How do I replicate programmatically (i.e. without calling svm-scale) the scaling process of SVM?

推荐答案

您在数据将具有零均值和单位方差.

The data will then have zero mean and unit variance.

这篇关于在scikit-learn SVM中缩放数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆