特征归一化-l2归一化的优势 [英] feature normalization- advantage of l2 normalization

查看:1119
本文介绍了特征归一化-l2归一化的优势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常在分类之前将特征标准化.

通常在文献中使用L1和L2归一化.

与L1规范(或L2规范)相比,有人能评论L2规范(或L1规范)的优势吗?

解决方案

L2在L1规范上的优势

    如aleju在评论中所述,可以很容易地计算L2范数的导数.因此,使用基于梯度的学习方法也很容易.
  • L2正则化 优化平均成本(而L1降低了中位数 解释),其中通常用作绩效衡量指标.如果您知道自己没有异常值,并且希望将总体误差保持在较小水平,则特别好.
  • 该解决方案更有可能是唯一的.这与上一点有关:虽然平均值是一个单一值,但中位数可能位于两点之间的间隔中,因此不是唯一的.
  • 虽然L1正则化可以为您提供稀疏系数向量,但是L2的非稀疏性可以提高您的预测性能(因为您利用了更多特征而不是简单地忽略它们).
  • L2在旋转下是不变的.如果您有一个由空间中的点组成的数据集并且应用了旋转,则仍然会得到相同的结果(即,点之间的距离保持不变).

L1优于L2规范

  • L1范数更喜欢稀疏系数向量. ( (有关Quora的说明).这意味着L1范数执行特征选择,您可以删除系数为0的所有特征.减小尺寸在几乎所有情况下都是有用的.
  • L1范数优化了中位数.因此,L1范数对异常值不敏感.

更多来源:

另一个

Features are usually normalized prior to classification.

L1 and L2 normalization are usually used in the literature.

Could anybody comment on the advantages of L2 norm (or L1 norm) compared to L1 norm (or L2 norm)?

解决方案

Advantages of L2 over L1 norm

  • As already stated by aleju in the comments, derivations of the L2 norm are easily computed. Therefore it is also easy to use gradient based learning methods.
  • L2 regularization optimizes the mean cost (whereas L1 reduces the median explanation) which is often used as a performance measurement. This is especially good if you know you don't have any outliers and you want to keep the overall error small.
  • The solution is more likely to be unique. This ties in with the previous point: While the mean is a single value, the median might be located in an interval between two points and is therefore not unique.
  • While L1 regularization can give you a sparse coefficient vector, the non-sparseness of L2 can improve your prediction performance (since you leverage more features instead of simply ignoring them).
  • L2 is invariant under rotation. If you have a dataset consisting of points in a space and you apply a rotation, you still get the same results (i.e. the distances between points remain the same).

Advantages of L1 over L2 norm

  • The L1 norm prefers sparse coefficient vectors. (explanation on Quora) This means the L1 norm performs feature selection and you can delete all features where the coefficient is 0. A reduction of the dimensions is useful in almost all cases.
  • The L1 norm optimizes the median. Therefore the L1 norm is not sensitive to outliers.

More sources:

The same question on Quora

Another one

这篇关于特征归一化-l2归一化的优势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆