计算Logistic损失函数的值和梯度时避免数值溢出 [英] Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

查看:823
本文介绍了计算Logistic损失函数的值和梯度时避免数值溢出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我正在尝试实现一种机器学习算法,它涉及到逻辑损失函数在MATLAB中。不幸的是,由于数值溢出,我遇到了麻烦。



通常,对于给定的输入 s , logistic函数的值是:

$ p $ log(1 + exp(s))

和逻辑损失函数的斜率是:

$ pre code> exp(s)./(1 + exp(s))= 1./(1+exp(-s))

在我的算法中, s = X * beta 的值。这里 X 是一个矩阵,其中 N 数据点, P 每个数据点的特征(即 size(X)= [N,P] )和 beta 对于每个特征的code> P 系数,使得 size(beta)= [P 1]



我特别感兴趣的是计算给定值 beta 的Logistic函数的平均值和梯度。 / p>

Logistic函数与 beta 的平均值为:

  L = 1 / N * sum(log(1 + exp(X * beta)),1)

Logistic函数的斜率的平均值wrt至$ b 的值是:

$ p $ dL = 1 / N * sum((exp(X * beta)./(1 + exp(X * beta))'X,1)'

请注意 size(dL)= [P 1]。



我的问题是这些表达式不断产生数值溢出,当 s> 1000 时, exp(s)= Inf >和 exp(s)= 0 s <-1000。



我正在寻找一个解决方案,使得 s 可以在浮点运算中取任何值。理想情况下,我也非常感谢一个解决方案,以矢量化/有效的方式评估值和渐变。

解决方案

b

- 计算 L ,如果 s 很大,那么 exp(s)会比1大得多:

  1 + exp(s)≅ exp(s)


$ b $ (p(s))= log(1 exp(s))。

如果 s 很小,那么使用exp()的泰勒系列

 <$使用log()的泰勒级数(Taylor series),c(c)> exp(s)≅1 + s 



pre $ log $(1 + exp(s))log(2 + s)log(2)+ s / 2。

- 用于计算 dL code $ s code

$ $ $ $ $ $ c $ exp $(exp(s)./(1 + exp(s) )≅1

小$ s

  exp(s)./(1 + exp(s))≅1/2 + s / 4. 

- 计算 L 的代码可能会像这样:

  s = X * beta; 
l = log(1 + exp(s));
ind = isinf(l);
l(ind)= s(ind);
ind =(l == 0); (ind)= log(2)+ s(ind)/ 2;
L = 1 / N * sum(l,1)


I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.

In general, for a given an input s, the value of the logistic function is:

 log(1 + exp(s))

and the slope of the logistic loss function is:

 exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))

In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1].

I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta.

The average value of the Logistic function w.r.t to a value of beta is:

 L = 1/N * sum(log(1+exp(X*beta)),1)

The average value of the slope of the Logistic function w.r.t. to a value of b is:

 dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'

Note that size(dL) = [P 1].

My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000.

I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.

解决方案

How about the following approximations:

– For computing L, if s is large, then exp(s) will be much larger than 1:

1 + exp(s) ≅ exp(s)

and consequently

log(1 + exp(s)) ≅ log(exp(s)) = s.

If s is small, then using the Taylor series of exp()

exp(s) ≅ 1 + s

and using the Taylor series of log()

log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2.

– For computing dL, for large s

exp(s) ./ (1 + exp(s)) ≅ 1

and for small s

exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4.

– The code to compute L could look for example like this:

s = X*beta;
l = log(1+exp(s));
ind = isinf(l);
l(ind) = s(ind);
ind = (l == 0);
l(ind) = log(2) + s(ind) / 2;
L = 1/N * sum(l,1)

这篇关于计算Logistic损失函数的值和梯度时避免数值溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆