激活函数的导数和在反向传播中的应用 [英] Derivative of activation function and use in backpropagation

查看：16 发布时间：2021/12/31 16:54:01 math artificial-intelligence machine-learning neural-network

本文介绍了激活函数的导数和在反向传播中的应用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在阅读这个文档，他们表示权重调整公式是这样的:

I am reading this document, and they stated that the weight adjustment formula is this:

新权重 = 旧权重 + 学习率 * delta * df(e)/de * 输入

new weight = old weight + learning rate * delta * df(e)/de * input

df(e)/de 部分是激活函数的导数，通常是类似于tanh 的sigmoid 函数.

The df(e)/de part is the derivative of the activation function, which is usually a sigmoid function like tanh.

这究竟是为了什么?
为什么我们还要乘以它?
为什么仅仅学习率*增量*输入不够?

这个问题是在这个问题之后出现的，并与之密切相关:为什么必须在反向传播神经网络中使用非线性激活函数?.

This question came after this one and is closely related to it: Why must a nonlinear activation function be used in a backpropagation neural network?.

推荐答案

训练一个神经网络只是指在权重矩阵(其中对于具有一个隐藏层的 NN，有两个)，以便最小化观察数据和预测数据之间的平方差.在实践中，包含两个权重矩阵的各个权重在每次迭代中进行调整(它们的初始值通常设置为随机值).这也称为在线模型，而不是经过多次迭代调整权重的批处理模型.

Training a neural network just refers to finding values for every cell in the weight matrices (of which there are two for a NN having one hidden layer) such that the squared differences between the observed and predicted data are minimized. In practice, the individual weights comprising the two weight matrices are adjusted with each iteration (their initial values are often set to random values). This is also called the online model, as opposed to the batch one where weights are adjusted after a lot of iterations.

但是应该如何调整权重——即，哪个方向 +/-?多少钱?

But how should the weights be adjusted--i.e., which direction +/-? And by how much?

这就是导数的用武之地.导数的大值将导致对相应权重的大幅度调整.这是有道理的，因为如果导数很大，则意味着您离最小值很远.换句话说，权重在每次迭代时在由总误差(观察到的与预测的)定义的成本函数表面上沿最陡下降(导数的最大值)的方向进行调整.

That's where the derivative come in. A large value for the derivative will result in a large adjustment to the corresponding weight. This makes sense because if the derivative is large that means you are far from a minima. Put another way, weights are adjusted at each iteration in the direction of steepest descent (highest value of the derivative) on the cost function's surface defined by the total error (observed versus predicted).

在计算每个模式的误差后(从该迭代过程中神经网络预测的值中减去响应变量或输出向量的实际值)，权重矩阵中的每个权重都按计算出的误差梯度成比例进行调整.

After the error on each pattern is computed (subtracting the actual value of the response varible or output vector from the value predicted by the NN during that iteration), each weight in the weight matrices is adjusted in proportion to the calculated error gradient.

因为误差计算从神经网络的末尾开始(即在输出层通过从预测中减去观察到的)并进行到前面，所以它被称为backprop.

Because the error calculation begins at the end of the NN (i.e., at the output layer by subtracting observed from predicted) and proceeds to the front, it is called backprop.

更一般地，优化技术使用导数(或梯度，用于多变量问题)(对于反向传播，共轭梯度可能是最常见的)定位目标(又名损失)函数的最小值.

More generally, the derivative (or gradient for multivariable problems) is used by the optimization technique (for backprop, conjugate gradient is probably the most common) to locate minima of the objective (aka loss) function.

它是这样工作的:

一阶导数是曲线上与它相切的线的斜率为 0 的点.

The first derivative is the point on a curve such that a line tangent to it has a slope of 0.

因此，如果您在由目标函数定义的 3D 表面上行走并且走到斜率 = 0 的点，那么您就在底部——您已经找到了一个最小值(无论是全局或本地)的功能.

So if you are walking around a 3D surface defined by the objective function and you walk to a point where slope = 0, then you are at the bottom--you have found a minima (whether global or local) for the function.

但一阶导数比这更重要.它还告诉您是否朝着正确的方向到达函数最小值.

But the first derivative is more important than that. It also tells you if you are going in the right direction to reach the function minimum.

如果您考虑在曲线/曲面上的点向下移至函数 minimumn 时切线的斜率会发生什么变化，就很容易理解为什么会这样.

It's easy to see why this is so if you think about what happens to the slope of the tangent line as the point on the curve/surface is moved down toward the function minimumn.

斜率(因此函数在该点的导数值)逐渐减小.换句话说，要最小化一个函数，请遵循导数——即，如果值在减小，那么您正在朝着正确的方向移动.

The slope (hence the value of the derivative of the function at that point) gradually decreases. In other words, to minimize a function, follow the derivative--i.e, if the value is decreasing then you are moving in the correct direction.

这篇关于激活函数的导数和在反向传播中的应用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

激活函数的导数和在反向传播中的应用 [英] Derivative of activation function and use in backpropagation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

激活函数的导数和在反向传播中的应用 [英] Derivative of activation function and use in backpropagation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭