数值稳定的softmax [英] Numercially stable softmax

查看:176
本文介绍了数值稳定的softmax的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是否有一种数值稳定的方法来计算softmax函数? 我得到的值在神经网络代码中成为Nans.

Is there a numerically stable way to compute softmax function below? I am getting values that becomes Nans in Neural network code.

np.exp(x)/np.sum(np.exp(y))

推荐答案

softmax exp( x )/sum(exp( x ))实际上在数值上很好-表现得很好.它只有正数项,因此我们不必担心失去重要性,并且分母至少与分子一样大,因此可以保证结果在0到1之间.

The softmax exp(x)/sum(exp(x)) is actually numerically well-behaved. It has only positive terms, so we needn't worry about loss of significance, and the denominator is at least as large as the numerator, so the result is guaranteed to fall between 0 and 1.

唯一可能发生的事故是指数溢出或溢出. x 的单个元素的上溢或所有元素的下溢将使输出或多或少地无用.

The only accident that might happen is over- or under-flow in the exponentials. Overflow of a single or underflow of all elements of x will render the output more or less useless.

但是很容易通过使用对任何标量c都成立的恒等式softmax( x )= softmax( x + c)来防止这种情况: x 中的 x )留下一个仅包含非正项的向量,排除了上溢,并且至少有一个零元素排除了消失的分母(某些情况下为下溢,并非所有条目都是无害的.

But it is easy to guard against that by using the identity softmax(x) = softmax(x + c) which holds for any scalar c: Subtracting max(x) from x leaves a vector that has only non-positive entries, ruling out overflow and at least one element that is zero ruling out a vanishing denominator (underflow in some but not all entries is harmless).

脚注:从理论上讲,总的来说,可能发生灾难性事故,但是您需要数量众多的荒谬术语.例如,即使使用只能解析3个小数的16位浮点数-相比正常" 64位浮点数的15个小数点-我们也需要2 ^ 1431(〜6 x 10 ^ 431)和2之间^ 1432得出的总和为减少了两倍.

Footnote: theoretically, catastrophic accidents in the sum are possible, but you'd need a ridiculous number of terms. For example, even using 16 bit floats which can only resolve 3 decimals---compared to 15 decimals of a "normal" 64 bit float---we'd need between 2^1431 (~6 x 10^431) and 2^1432 to get a sum that is off by a factor of two.

这篇关于数值稳定的softmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆