神经网络的softmax激活函数的实现 [英] Implementation of a softmax activation function for neural networks
问题描述
我正在神经网络的最后一层使用 Softmax 激活函数.但是我对该功能的安全实现有疑问.
I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.
一个简单的实现就是这样:
A naive implementation would be this one:
Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
y(f) = exp(y(f));
y /= y.sum();
这对于> 100个隐藏节点而言效果不佳,因为在许多情况下y将是NaN
(如果y(f)> 709,则exp(y(f))将返回inf).我想出了这个版本:
This does not work very well for > 100 hidden nodes because the y will be NaN
in many cases (if y(f) > 709, exp(y(f)) will return inf). I came up with this version:
Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
y(f) = safeExp(y(f), y.rows());
y /= y.sum();
其中safeExp
定义为
double safeExp(double x, int div)
{
static const double maxX = std::log(std::numeric_limits<double>::max());
const double max = maxX / (double) div;
if(x > max)
x = max;
return std::exp(x);
}
此功能限制exp的输入.在大多数情况下,这是可行的,但并非在所有情况下都可行,而且我并没有真正找出在哪些情况下不可行.当我在上一层有800个隐藏的神经元时,它根本不起作用.
This function limits the input of exp. In most of the cases this works but not in all cases and I did not really manage to find out in which cases it does not work. When I have 800 hidden neurons in the previous layer it does not work at all.
但是,即使这可行,我还是会以某种方式扭曲" ANN的结果.您能想到其他方法来计算正确的解决方案吗?我可以使用任何C ++库或技巧来计算此ANN的确切输出吗?
However, even if this worked I somehow "distort" the result of the ANN. Can you think of any other way to calculate the correct solution? Are there any C++ libraries or tricks that I can use to calculate the exact output of this ANN?
编辑:Itamar Katz提供的解决方案是:
edit: The solution provided by Itamar Katz is:
Vector y = mlp(x); // output of the neural network without softmax activation function
double ymax = maximal component of y
for(int f = 0; f < y.rows(); f++)
y(f) = exp(y(f) - ymax);
y /= y.sum();
在数学上确实是一样的.但是实际上,由于浮点精度,一些小值变为0.我想知道为什么没人把这些实现细节写在教科书上.
And it really is mathematically the same. In practice however, some small values become 0 because of the floating point precision. I wonder why nobody ever writes these implementation details down in textbooks.
推荐答案
首先进入对数刻度,即计算log(y)
而不是y
.分子的对数微不足道.为了计算分母的对数,可以使用以下技巧": http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/
First go to log scale, i.e calculate log(y)
instead of y
. The log of the numerator is trivial. In order to calculate the log of the denominator, you can use the following 'trick': http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/
这篇关于神经网络的softmax激活函数的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!