神经网络的softmax激活函数的实现 [英] Implementation of a softmax activation function for neural networks

查看:681
本文介绍了神经网络的softmax激活函数的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在神经网络的最后一层使用 Softmax 激活函数.但是我对该功能的安全实现有疑问.

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.

一个简单的实现就是这样:

A naive implementation would be this one:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f));
y /= y.sum();

这对于> 100个隐藏节点而言效果不佳,因为在许多情况下y将是NaN(如果y(f)> 709,则exp(y(f))将返回inf).我想出了这个版本:

This does not work very well for > 100 hidden nodes because the y will be NaN in many cases (if y(f) > 709, exp(y(f)) will return inf). I came up with this version:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = safeExp(y(f), y.rows());
y /= y.sum();

其中safeExp定义为

double safeExp(double x, int div)
{
  static const double maxX = std::log(std::numeric_limits<double>::max());
  const double max = maxX / (double) div;
  if(x > max)
    x = max;
  return std::exp(x);
}

此功能限制exp的输入.在大多数情况下,这是可行的,但并非在所有情况下都可行,而且我并没有真正找出在哪些情况下不可行.当我在上一层有800个隐藏的神经元时,它根本不起作用.

This function limits the input of exp. In most of the cases this works but not in all cases and I did not really manage to find out in which cases it does not work. When I have 800 hidden neurons in the previous layer it does not work at all.

但是,即使这可行,我还是会以某种方式扭曲" ANN的结果.您能想到其他方法来计算正确的解决方案吗?我可以使用任何C ++库或技巧来计算此ANN的确切输出吗?

However, even if this worked I somehow "distort" the result of the ANN. Can you think of any other way to calculate the correct solution? Are there any C++ libraries or tricks that I can use to calculate the exact output of this ANN?

编辑:Itamar Katz提供的解决方案是:

edit: The solution provided by Itamar Katz is:

Vector y = mlp(x); // output of the neural network without softmax activation function
double ymax = maximal component of y
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f) - ymax);
y /= y.sum();

在数学上确实是一样的.但是实际上,由于浮点精度,一些小值变为0.我想知道为什么没人把这些实现细节写在教科书上.

And it really is mathematically the same. In practice however, some small values become 0 because of the floating point precision. I wonder why nobody ever writes these implementation details down in textbooks.

推荐答案

首先进入对数刻度,即计算log(y)而不是y.分子的对数微不足道.为了计算分母的对数,可以使用以下技巧": http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/

First go to log scale, i.e calculate log(y) instead of y. The log of the numerator is trivial. In order to calculate the log of the denominator, you can use the following 'trick': http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/

这篇关于神经网络的softmax激活函数的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆