计算Logistic损失函数的值和梯度时避免数值溢出 [英] Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

查看：823 发布时间：2017/12/21 21:50:27 matlab floating-point numerical-methods logistic-regression numerical-stability

本文介绍了计算Logistic损失函数的值和梯度时避免数值溢出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前我正在尝试实现一种机器学习算法，它涉及到逻辑损失函数在MATLAB中。不幸的是，由于数值溢出，我遇到了麻烦。

通常，对于给定的输入 s ， logistic函数的值是：

$ p $ log（1 + exp（s））

和逻辑损失函数的斜率是：

$ pre code> exp（s）./（1 + exp（s））= 1./(1+exp（-s））

在我的算法中， s = X * beta 的值。这里 X 是一个矩阵，其中 N 数据点， P 每个数据点的特征（即 size（X）= [N，P] ）和 beta 对于每个特征的code> P 系数，使得 size（beta）= [P 1] 。

我特别感兴趣的是计算给定值 beta 的Logistic函数的平均值和梯度。 / p>

Logistic函数与 beta 的平均值为：

  L = 1 / N * sum（log（1 + exp（X * beta）），1）

Logistic函数的斜率的平均值wrt至$ b 的值是：

$ p $ dL = 1 / N * sum（（exp（X * beta）./（1 + exp（X * beta））'X，1）'

请注意 size（dL）= [P 1]。

我的问题是这些表达式不断产生数值溢出，当 s> 1000 时， exp（s）= Inf >和 exp（s）= 0 当 s <-1000。

我正在寻找一个解决方案，使得 s 可以在浮点运算中取任何值。理想情况下，我也非常感谢一个解决方案，以矢量化/有效的方式评估值和渐变。

解决方案

- 计算 L ，如果 s 很大，那么 exp（s）会比1大得多：

  1 + exp（s）≅ exp（s）

$ b $ （p（s））= log（1 exp（s））。

如果 s 很小，那么使用exp（）的泰勒系列

 <$使用log（）的泰勒级数（Taylor series），c（c）> exp（s）≅1 + s

pre $ log $（1 + exp（s））log（2 + s）log（2）+ s / 2。

- 用于计算 dL code $ s code

$ $ $ $ $ $ c $ exp $（exp（s）./（1 + exp（s））≅1

小$ s

  exp（s）./（1 + exp（s））≅1/2 + s / 4.

- 计算 L 的代码可能会像这样：

  s = X * beta; 
 l = log（1 + exp（s））; 
 ind = isinf（l）; 
 l（ind）= s（ind）; 
 ind =（l == 0）; （ind）= log（2）+ s（ind）/ 2; 
 L = 1 / N * sum（l，1）

I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.

In general, for a given an input s, the value of the logistic function is: log(1 + exp(s)) and the slope of the logistic loss function is: exp(s)./(1 + exp(s)) = 1./(1 + exp(-s)) In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1]. I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta. The average value of the Logistic function w.r.t to a value of beta is: L = 1/N * sum(log(1+exp(X*beta)),1) The average value of the slope of the Logistic function w.r.t. to a value of b is: dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)' Note that size(dL) = [P 1]. My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000. I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way. 解决方案 How about the following approximations: – For computing L, if s is large, then exp(s) will be much larger than 1: 1 + exp(s) ≅ exp(s) and consequently log(1 + exp(s)) ≅ log(exp(s)) = s. If s is small, then using the Taylor series of exp() exp(s) ≅ 1 + s and using the Taylor series of log() log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2. – For computing dL, for large s exp(s) ./ (1 + exp(s)) ≅ 1 and for small s exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4. – The code to compute L could look for example like this: s = X*beta; l = log(1+exp(s)); ind = isinf(l); l(ind) = s(ind); ind = (l == 0); l(ind) = log(2) + s(ind) / 2; L = 1/N * sum(l,1) 这篇关于计算Logistic损失函数的值和梯度时避免数值溢出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算Logistic损失函数的值和梯度时避免数值溢出 [英] Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算Logistic损失函数的值和梯度时避免数值溢出 [英] Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭