支持向量机缩放输入值 [英] svm scaling input values

查看:23
本文介绍了支持向量机缩放输入值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 libSVM.假设我的特征值采用以下格式:

I am using libSVM. Say my feature values are in the following format:

                         instance1 : f11, f12, f13, f14
                         instance2 : f21, f22, f23, f24
                         instance3 : f31, f32, f33, f34
                         instance4 : f41, f42, f43, f44
                         ..............................
                         instanceN : fN1, fN2, fN3, fN4

我认为有两种缩放可以应用.

I think there are two scaling can be applied.

  1. 缩放每个实例向量,使每个向量的均值和单位方差为零.

  1. scale each instance vector such that each vector has zero mean and unit variance.

    ( (f11, f12, f13, f14) - mean((f11, f12, f13, f14) ). /std((f11, f12, f13, f14) )

  • 将上述矩阵的每一列缩放到一个范围.例如 [-1, 1]

  • scale each colum of the above matrix to a range. for example [-1, 1]

    根据我对 RBF 内核 (libSVM) 的实验,我发现第二次缩放 (2) 将结果提高了约 10%.我不明白为什么(2)给了我一个改进的结果.

    According to my experiments with RBF kernel (libSVM) I found that the second scaling (2) improves the results by about 10%. I did not understand the reason why (2) gives me a improved results.

    谁能解释我应用缩放的原因是什么以及为什么第二个选项可以让我得到更好的结果?

    Could anybody explain me what is the reason for applying scaling and why the second option gives me improved results?

    推荐答案

    标准的做法是使每个维度(或属性,或列(在您的示例中))具有零均值和单位方差.

    The standard thing to do is to make each dimension (or attribute, or column (in your example)) have zero mean and unit variance.

    这使 SVM 的每个维度都具有相同的量级.来自 http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf:

    This brings each dimension of the SVM into the same magnitude. From http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf:

    缩放的主要优点是避免属性在更大的数字范围在较小的数字范围内占主导地位.另一个好处是避免计算过程中的数值困难.因为内核值通常取决于特征向量的内积,例如线性核和多项式核-nel,大的属性值可能会导致数值问题.我们推荐线性将每个属性缩放到 [-1,+1] 或 [0,1] 范围内.

    The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Another advantage is to avoid numerical diculties during the calculation. Because kernel values usually depend on the inner products of feature vectors, e.g. the linear kernel and the polynomial ker- nel, large attribute values might cause numerical problems. We recommend linearly scaling each attribute to the range [-1,+1] or [0,1].

    这篇关于支持向量机缩放输入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆