反向传播算法如何处理不可微的激活函数? [英] How does the back-propagation algorithm deal with non-differentiable activation functions?

查看：217 发布时间：2020/5/4 8:59:36 machine-learning neural-network deep-learning backpropagation

本文介绍了反向传播算法如何处理不可微的激活函数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在深入探讨神经网络以及如何有效地训练它们的话题时，我遇到了使用非常简单的激活函数(例如，整流线性单元(ReLU))代替经典的平滑S型曲线的方法. ReLU函数在原点是不可微的，因此，根据我的理解，反向传播算法(BPA)不适合用于使用ReLU训练神经网络，因为多变量演算的链规则仅指平滑函数. 但是，我所阅读的有关使用ReLU的论文均未解决此问题. ReLU似乎非常有效，并且似乎几乎在任何地方都可以使用，而不会引起任何意外行为.有人可以向我解释为什么可以通过反向传播算法完全训练ReLU吗?

while digging through the topic of neural networks and how to efficiently train them I came across the method of using very simple activation functions, such as the recified linear unit (ReLU), instead of the classic smooth sigmoids. The ReLU-function is not differentiable at the origin, so according to my understanding the backpropagation algorithm (BPA) is not suitable for training a neural network with ReLUs, since the chain rule of multivariable calculus refers to smooth functions only. However, none of the papers about using ReLUs that I read address this issue. ReLUs seem to be very effective and seem to be used virtually everywhere while not causing any unexpected behavior. Can somebody explain to me why ReLUs can be trained at all via the backpropagation algorithm?

推荐答案

要了解ReLU等函数甚至如何进行反向传播，您需要了解使反向传播算法如此出色的导数最重要的特性是什么.该属性是:

To understand how backpropagation is even possible with functions like ReLU you need to understand what is the most important property of derivative that makes backpropagation algorithm works so well. This property is that :

f(x) ~ f(x0) + f'(x0)(x - x0)

如果您现在将x0作为参数的实际值-您可以告诉(知道成本函数的值和它的导数)稍微改变参数时成本函数的行为.这是反向传播中最关键的事情.

If you treat x0 as actual value of your parameter at the moment - you can tell (knowing value of a cost function and it's derivative) how the cost function will behave when you change your parameters a little bit. This is most crucial thing in backpropagation.

由于计算成本函数对于成本计算至关重要，因此您将需要您的成本函数来满足上述属性.很容易检查ReLU是否在除0的小邻域之外的所有地方都满足此属性.这是ReLU唯一的问题-当我们接近0时，我们将无法使用此属性.

Because of the fact that computing cost function is crucial for a cost computation - you will need your cost function to satisfy the property stated above. It's easy to check that ReLU satisfy this property everywhere except a small neighbourhood of 0. And this is the only problem with ReLU - the fact that we cannot use this property when we are close to 0.

要解决此问题，您可以将0中的ReLU派生值选择为1或0.另一方面，大多数研究人员不会仅仅因为以下事实而将这个问题视为严重问题，因为在ReLU计算过程中接近0的情况相对较少.

To overcome that you may choose the value of ReLU derivative in 0 to either 1 or 0. On the other hand most of researchers don't treat this problem as serious simply because of the fact, that being close to 0 during ReLU computations is relatively rare.

从上面-当然-从纯粹的数学观点来看，将ReLU与反向传播算法一起使用是不合理的.另一方面-实际上，它在0.

From the above - of course - from the pure mathematical point of view it's not plausible to use ReLU with backpropagation algorithm. On the other hand - in practice it usually doesn't make any difference that it has this weird behaviour around 0.

这篇关于反向传播算法如何处理不可微的激活函数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

反向传播算法如何处理不可微的激活函数? [英] How does the back-propagation algorithm deal with non-differentiable activation functions?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

反向传播算法如何处理不可微的激活函数? [英] How does the back-propagation algorithm deal with non-differentiable activation functions?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭