计算Logistic损失函数的值和梯度时避免数值溢出 [英] Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function
问题描述
通常,对于给定的输入 s
, logistic函数的值是:
$ p $ log(1 + exp(s))
和逻辑损失函数的斜率是:
$ pre code> exp(s)./(1 + exp(s))= 1./(1+exp(-s))
在我的算法中, s = X * beta
的值。这里 X
是一个矩阵,其中 N
数据点, P
每个数据点的特征(即 size(X)= [N,P]
)和 beta
对于每个特征的code> P 系数,使得 size(beta)= [P 1]
。
我特别感兴趣的是计算给定值 beta
的Logistic函数的平均值和梯度。 / p>
Logistic函数与 beta
的平均值为:
L = 1 / N * sum(log(1 + exp(X * beta)),1)
Logistic函数的斜率的平均值wrt至$ b
的值是:
$ p $ dL = 1 / N * sum((exp(X * beta)./(1 + exp(X * beta))'X,1)'
请注意 size(dL)= [P 1]。
我的问题是这些表达式不断产生数值溢出,当 s> 1000
时, exp(s)= Inf
>和 exp(s)= 0
当 s <-1000。
我正在寻找一个解决方案,使得 s
可以在浮点运算中取任何值。理想情况下,我也非常感谢一个解决方案,以矢量化/有效的方式评估值和渐变。
b
- 计算 L
,如果 s
很大,那么 exp(s)
会比1大得多:
1 + exp(s)≅ exp(s)
$ b $ (p(s))= log(1 exp(s))。
如果 s
很小,那么使用exp()的泰勒系列
<$使用log()的泰勒级数(Taylor series),c(c)> exp(s)≅1 + s
pre $ log $(1 + exp(s))log(2 + s)log(2)+ s / 2。
- 用于计算 dL
code $ s code
$ $ $ $ $ $ c $ exp $(exp(s)./(1 + exp(s) )≅1
小$
exp(s)./(1 + exp(s))≅1/2 + s / 4.
- 计算 L
的代码可能会像这样:
s = X * beta;
l = log(1 + exp(s));
ind = isinf(l);
l(ind)= s(ind);
ind =(l == 0); (ind)= log(2)+ s(ind)/ 2;
L = 1 / N * sum(l,1)
I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.
In general, for a given an input s
, the value of the logistic function is:
log(1 + exp(s))
and the slope of the logistic loss function is:
exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))
In my algorithm, the value of s = X*beta
. Here X
is a matrix with N
data points and P
features per data point (i.e. size(X)=[N,P]
) and beta
is a vector of P
coefficients for each feature such that size(beta)=[P 1]
.
I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta
.
The average value of the Logistic function w.r.t to a value of beta
is:
L = 1/N * sum(log(1+exp(X*beta)),1)
The average value of the slope of the Logistic function w.r.t. to a value of b
is:
dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'
Note that size(dL) = [P 1].
My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf
when s>1000
and exp(s)=0
when s<-1000.
I am looking for a solution such that s
can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.
How about the following approximations:
– For computing L
, if s
is large, then exp(s)
will be much larger than 1:
1 + exp(s) ≅ exp(s)
and consequently
log(1 + exp(s)) ≅ log(exp(s)) = s.
If s
is small, then using the Taylor series of exp()
exp(s) ≅ 1 + s
and using the Taylor series of log()
log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2.
– For computing dL
, for large s
exp(s) ./ (1 + exp(s)) ≅ 1
and for small s
exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4.
– The code to compute L
could look for example like this:
s = X*beta;
l = log(1+exp(s));
ind = isinf(l);
l(ind) = s(ind);
ind = (l == 0);
l(ind) = log(2) + s(ind) / 2;
L = 1/N * sum(l,1)
这篇关于计算Logistic损失函数的值和梯度时避免数值溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!