神经网络的softmax激活函数的实现 [英] Implementation of a softmax activation function for neural networks

查看：681 发布时间：2020/5/6 10:42:23 c++ math neural-network softmax

本文介绍了神经网络的softmax激活函数的实现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在神经网络的最后一层使用 Softmax 激活函数.但是我对该功能的安全实现有疑问.

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.

一个简单的实现就是这样:

A naive implementation would be this one:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f));
y /= y.sum();

这对于> 100个隐藏节点而言效果不佳，因为在许多情况下y将是NaN(如果y(f)> 709，则exp(y(f))将返回inf).我想出了这个版本:

This does not work very well for > 100 hidden nodes because the y will be NaN in many cases (if y(f) > 709, exp(y(f)) will return inf). I came up with this version:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = safeExp(y(f), y.rows());
y /= y.sum();

其中safeExp定义为

double safeExp(double x, int div)
{
  static const double maxX = std::log(std::numeric_limits<double>::max());
  const double max = maxX / (double) div;
  if(x > max)
    x = max;
  return std::exp(x);
}

此功能限制exp的输入.在大多数情况下，这是可行的，但并非在所有情况下都可行，而且我并没有真正找出在哪些情况下不可行.当我在上一层有800个隐藏的神经元时，它根本不起作用.

This function limits the input of exp. In most of the cases this works but not in all cases and I did not really manage to find out in which cases it does not work. When I have 800 hidden neurons in the previous layer it does not work at all.

但是，即使这可行，我还是会以某种方式扭曲" ANN的结果.您能想到其他方法来计算正确的解决方案吗?我可以使用任何C ++库或技巧来计算此ANN的确切输出吗?

However, even if this worked I somehow "distort" the result of the ANN. Can you think of any other way to calculate the correct solution? Are there any C++ libraries or tricks that I can use to calculate the exact output of this ANN?

编辑:Itamar Katz提供的解决方案是:

edit: The solution provided by Itamar Katz is:

Vector y = mlp(x); // output of the neural network without softmax activation function
double ymax = maximal component of y
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f) - ymax);
y /= y.sum();

在数学上确实是一样的.但是实际上，由于浮点精度，一些小值变为0.我想知道为什么没人把这些实现细节写在教科书上.

And it really is mathematically the same. In practice however, some small values become 0 because of the floating point precision. I wonder why nobody ever writes these implementation details down in textbooks.

神经网络的softmax激活函数的实现 [英] Implementation of a softmax activation function for neural networks

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

神经网络的softmax激活函数的实现 [英] Implementation of a softmax activation function for neural networks

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭