为什么在SVM中缩放功能? [英] Why feature scaling in SVM?

查看:123
本文介绍了为什么在SVM中缩放功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现SVM(支持向量机)问题的扩展确实改善了它的性能... 我已经阅读了以下说明:

I found that scaling in SVM (Support Vector Machine) problems really improve its performance... I have read this explanation:

缩放的主要优点是避免较大数值范围的属性主导较小数值范围的属性."

"The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges."

不幸的是,这并没有帮助我...有人可以为我提供更好的解释吗? 预先谢谢你!

Unfortunately this didn't help me ... Can somebody provide me a better explanation? Thank you in advance!

推荐答案

在SVM中扩展功能背后的真正原因是,该分类器并非仿射变换不变.换句话说,如果将一个功能乘以1000,则SVM提供的解决方案将完全不同.它与底层的优化技术几乎没有关系(尽管它们受这些规模问题的影响,但仍应收敛到全局最优).

The true reason behind scaling features in SVM is the fact, that this classifier is not affine transformation invariant. In other words, if you multiply one feature by a 1000 than a solution given by SVM will be completely different. It has nearly nothing to do with the underlying optimization techniques (although they are affected by these scales problems, they should still converge to global optimum).

请考虑一个示例:您有一个男人和一个女人,分别由其性别和身高编码(两个特征).让我们假设使用这种数据的情况非常简单:

Consider an example: you have man and a woman, encoded by their sex and height (two features). Let us assume a very simple case with such data:

0->男人 1->女人

0 -> man 1 -> woman

╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║  1  ║  150   ║
╠═════╬════════╣
║  1  ║  160   ║
╠═════╬════════╣
║  1  ║  170   ║
╠═════╬════════╣
║  0  ║  180   ║
╠═════╬════════╣
║  0  ║  190   ║
╠═════╬════════╣
║  0  ║  200   ║
╚═════╩════════╝

让我们做些愚蠢的事情.训练它来预测人的性别,因此我们试图学习f(x,y)= x(忽略第二个参数).

And let us do something silly. Train it to predict the sex of the person, so we are trying to learn f(x,y)=x (ignoring second parameter).

很容易看出,对于此类数据,最大的边缘分类器将在"175"身高附近的某个地方水平"切割飞机,因此一旦获得新的样本"0 178"(身高178cm的女性),我们将获得分类是她是男人.

It is easy to see, that for such data largest margin classifier will "cut" the plane horizontally somewhere around height "175", so once we get new sample "0 178" (a woman of 178cm height) we get the classification that she is a man.

但是,如果我们将所有内容缩小到[0,1],我们都会得到类似的东西

However, if we scale down everything to [0,1] we get sth like

╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║  1  ║  0.0   ║
╠═════╬════════╣
║  1  ║  0.2   ║
╠═════╬════════╣
║  1  ║  0.4   ║
╠═════╬════════╣
║  0  ║  0.6   ║
╠═════╬════════╣
║  0  ║  0.8   ║
╠═════╬════════╣
║  0  ║  1.0   ║
╚═════╩════════╝

现在最大的边距分类器几乎按预期(垂直)切"平面,因此在给定新样本"0 178"(也将其缩放到"0 0.56"左右)的情况下,我们认为这是一个女人(正确!)

and now largest margin classifier "cuts" the plane nearly vertically (as expected) and so given new sample "0 178" which is also scaled to around "0 0.56" we get that it is a woman (correct!)

因此,通常来说,缩放可确保仅由于某些功能而不会导致将其用作主要预测变量.

So in general - scaling ensures that just because some features are big it won't lead to using them as a main predictor.

这篇关于为什么在SVM中缩放功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆