标准化 SVM 的特征值 [英] Normalizing feature values for SVM

查看:43
本文介绍了标准化 SVM 的特征值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在玩一些 SVM 实现,我想知道 - 将特征值标准化以适应一个范围的最佳方法是什么?(从 0 到 1)

I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1)

假设我有 3 个特征值在以下范围内:

Let's suppose I have 3 features with values in ranges of:

  1. 3 - 5.

  1. 3 - 5.

0.02 - 0.05

0.02 - 0.05

10-15.

如何将所有这些值转换为 [0,1] 的范围?

How do I convert all of those values into range of [0,1]?

如果在训练期间,我将遇到的特征编号 1 的最大值是 5,而在我开始在更大的数据集上使用我的模型后,我会偶然发现高达 7 的值?那么在转换后的范围内,会超过1...

What If, during training, the highest value of feature number 1 that I will encounter is 5 and after I begin to use my model on much bigger datasets, I will stumble upon values as high as 7? Then in the converted range, it would exceed 1...

如何在训练期间对值进行归一化,以解决野外值"超过模型在训练期间看到"的最高(或最低)值的可能性?当这种情况发生时,模型将如何反应以及如何使其正常工作?

How do I normalize values during training to account for the possibility of "values in the wild" exceeding the highest(or lowest) values the model "seen" during training? How will the model react to that and how I make it work properly when that happens?

推荐答案

通过将向量转换为单位向量来对其进行归一化.这会在特征的相对值上训练 SVM,而不是幅度.归一化算法适用于任何值的向量.

You normalise a vector by converting it to a unit vector. This trains the SVM on the relative values of the features, not the magnitudes. The normalisation algorithm will work on vectors with any values.

要转换为单位向量,将每个值除以向量的长度.例如,[4 0.02 12] 的向量长度为​​ 12.6491.归一化向量然后是 [4/12.6491 0.02/12.6491 12/12.6491] = [0.316 0.0016 0.949].

To convert to a unit vector, divide each value by the length of the vector. For example, a vector of [4 0.02 12] has a length of 12.6491. The normalised vector is then [4/12.6491 0.02/12.6491 12/12.6491] = [0.316 0.0016 0.949].

如果在野外"我们遇到 [400 2 1200] 的向量,它将归一化为与上述相同的单位向量.特征的大小被归一化抵消"了,剩下的相对值在 0 和 1 之间.

If "in the wild" we encounter a vector of [400 2 1200] it will normalise to the same unit vector as above. The magnitudes of the features is "cancelled out" by the normalisation and we are left with relative values between 0 and 1.

这篇关于标准化 SVM 的特征值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆