λ很大的正则成本函数 [英] Regularized cost function with very large λ
问题描述
在机器学习中考虑成本函数和正则化:
Consider the cost function with regularization in machine learning:
当我们将参数λ设置得非常大时,为什么参数θ会趋于零?
Why will the parameter θ towards to zero when we set the parameter λ to be very large?
推荐答案
正则化成本函数受参数θ的大小的影响.
The regularized cost function is penalized by the size of the parameters θ.
在λ → +inf
The regularization term dominates the cost in case λ → +inf
值得注意的是,当λ很大时,大部分成本将来自正则项λ * sum (θ²)
而不是实际成本sum((h_θ - y)²)
,因此在这种情况下,主要是将正则项θ → 0
)
It is worth noting that when λ is very large, most of the cost will be coming from the regularization term λ * sum (θ²)
and not the actual cost sum((h_θ - y)²)
, hence in that case it's mostly about minimizing the regularization term λ * sum (θ²)
by tending θ towards 0 (θ → 0
)
为什么将λ * sum (θ²)
最小化会导致θ → 0
Why minimizing λ * sum (θ²)
results in θ → 0
考虑正则化项λ * sum (θ²)
,为了最小化该项,唯一的解决方案是推送sum(θ²) → 0
. (λ
是正常数,sum
项也是正)
Consider the regularization term λ * sum (θ²)
, to minimize this term the only solution is to push sum(θ²) → 0
. (λ
is a positive constant, and the sum
term is also positive)
并且由于θ
项是平方的(θ²
始终为正),所以唯一的方法是将θ
参数推向0.因此,sum(θ²) → 0
表示θ → 0
And since θ
terms are squared (θ²
is always positive), the only way is to push the θ
parameters towards 0. Hence sum(θ²) → 0
means θ → 0
总结一下,在λ非常大的情况下:
最小化成本函数主要是关于最小化λ * sum (θ²)
,这需要最小化sum (θ²)
,这需要θ → 0
Minimizing the cost function is mostly about minimizing λ * sum (θ²)
, which requires minimizing sum (θ²)
, which requires θ → 0
一些直觉可以回答评论中的问题:
将λ作为参数来告诉您要发生多少正则化.例如.如果在极端情况下将λ设置为0,则您的成本函数根本就不会正规化.如果将λ设置为较小的数字,则将获得较少的正则化.
Think of λ as a parameter for you to tell how much of a regularization you want to happen. E.g. if on the extreme you set λ to 0, then your cost function is not regularized at all. If you set λ to a lower number then you get less of a regularization.
反之亦然,增加λ越大,要求成本函数进行正则化的程度就越高,因此,为了最小化正则化的成本函数,必须减小参数θ.
And vice versa, the more you increase λ, the more your asking your cost function to regularized, so the smaller the parameters θ will have to be in order to minimize the regularized cost function.
为什么在正则化和中使用θ²而不是θ?
因为目标是使θ小(不易过度拟合).
如果正则项的总和使用θ而不是θ²,
您可以得到彼此抵消的大θ值,
例如θ_1= 1000000并且θ_2= -1000001,这里的sum(θ)
是-1,它很小,而如果您使用sum(|θ|)
(绝对值)或sum(θ²)
(平方),则最终会得到非常大的值.
Because the goal is to have small θ (less prone to overfitting).
If the regularization term uses θ instead of θ² in the sum,
you can end up with large θ values that cancel each other,
e.g. θ_1 = 1000000 and θ_2 = -1000001, the sum(θ)
here is -1 which is small, vs if you took sum(|θ|)
(absolute value) or sum(θ²)
(squared) you'd end up with a very big value.
在这种情况下,由于大的θ值会因正则项相互抵消而无法进行正则化而最终导致过度拟合.
In that case you may end up overfitting because of large θ values that escaped the regularization because the terms cancel each other out.
这篇关于λ很大的正则成本函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!