λ很大的正则成本函数 [英] Regularized cost function with very large λ

查看:99
本文介绍了λ很大的正则成本函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在机器学习中考虑成本函数和正则化:

Consider the cost function with regularization in machine learning:

当我们将参数λ设置得非常大时,为什么参数θ会趋于零?

Why will the parameter θ towards to zero when we set the parameter λ to be very large?

推荐答案

正则化成本函数受参数θ的大小的影响.

The regularized cost function is penalized by the size of the parameters θ.

λ → +inf

The regularization term dominates the cost in case λ → +inf

值得注意的是,当λ很大时,大部分成本将来自正则项λ * sum (θ²)而不是实际成本sum((h_θ - y)²),因此在这种情况下,主要是将正则项将θ趋向于0(θ → 0)

It is worth noting that when λ is very large, most of the cost will be coming from the regularization term λ * sum (θ²) and not the actual cost sum((h_θ - y)²), hence in that case it's mostly about minimizing the regularization term λ * sum (θ²) by tending θ towards 0 (θ → 0)

为什么将λ * sum (θ²)最小化会导致θ → 0

Why minimizing λ * sum (θ²) results in θ → 0

考虑正则化项λ * sum (θ²),为了最小化该项,唯一的解决方案是推送sum(θ²) → 0. (λ是正常数,sum项也是正)

Consider the regularization term λ * sum (θ²), to minimize this term the only solution is to push sum(θ²) → 0. (λ is a positive constant, and the sum term is also positive)

并且由于θ项是平方的(θ²始终为正),所以唯一的方法是将θ参数推向0.因此,sum(θ²) → 0表示θ → 0

And since θ terms are squared (θ² is always positive), the only way is to push the θ parameters towards 0. Hence sum(θ²) → 0 means θ → 0

总结一下,在λ非常大的情况下:

最小化成本函数主要是关于最小化λ * sum (θ²),这需要最小化sum (θ²),这需要θ → 0

Minimizing the cost function is mostly about minimizing λ * sum (θ²), which requires minimizing sum (θ²), which requires θ → 0

一些直觉可以回答评论中的问题:

将λ作为参数来告诉您要发生多少正则化.例如.如果在极端情况下将λ设置为0,则您的成本函数根本就不会正规化.如果将λ设置为较小的数字,则将获得较少的正则化.

Think of λ as a parameter for you to tell how much of a regularization you want to happen. E.g. if on the extreme you set λ to 0, then your cost function is not regularized at all. If you set λ to a lower number then you get less of a regularization.

反之亦然,增加λ越大,要求成本函数进行正则化的程度就越高,因此,为了最小化正则化的成本函数,必须减小参数θ.

And vice versa, the more you increase λ, the more your asking your cost function to regularized, so the smaller the parameters θ will have to be in order to minimize the regularized cost function.

为什么在正则化和中使用θ²而不是θ?

因为目标是使θ小(不易过度拟合). 如果正则项的总和使用θ而不是θ², 您可以得到彼此抵消的大θ值, 例如θ_1= 1000000并且θ_2= -1000001,这里的sum(θ)是-1,它很小,而如果您使用sum(|θ|)(绝对值)或sum(θ²)(平方),则最终会得到非常大的值.

Because the goal is to have small θ (less prone to overfitting). If the regularization term uses θ instead of θ² in the sum, you can end up with large θ values that cancel each other, e.g. θ_1 = 1000000 and θ_2 = -1000001, the sum(θ) here is -1 which is small, vs if you took sum(|θ|) (absolute value) or sum(θ²) (squared) you'd end up with a very big value.

在这种情况下,由于大的θ值会因正则项相互抵消而无法进行正则化而最终导致过度拟合.

In that case you may end up overfitting because of large θ values that escaped the regularization because the terms cancel each other out.

这篇关于λ很大的正则成本函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆