在LSTM中使用tanh的直觉是什么 [英] What is the intuition of using tanh in LSTM
问题描述
在LSTM网络(了解LSTM )中,为什么输入门和输出门使用tanh?这背后的直觉是什么? 这只是一个非线性变换? 如果是,我可以同时将其更改为另一个激活功能(例如ReLU)吗?
In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I change both to another activation function (e.g. ReLU)?
推荐答案
Sigmoid
专门用作LSTM
中3个门(进,出,忘记)的门控功能,因为它输出的值介于0和1,它要么使整个闸门不流通,要么使信息流通不畅.另一方面,要克服消失的梯度问题,我们需要一个函数,该函数的二阶导数在变为零之前可以维持很长的范围. Tanh
是具有上述属性的良好功能.
Sigmoid
specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM
, since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero. Tanh
is a good function with the above property.
一个好的神经元单元应该是有界的,易于区分的,单调的(有利于凸优化)并且易于处理.如果您考虑这些品质,那么我相信您可以使用ReLU
代替tanh
函数,因为它们是彼此很好的替代品.但是在选择激活功能之前,您必须知道选择相对于其他功能的优缺点是什么.我将简要介绍一些激活功能及其优势.
A good neuron unit should be bounded, easily differentiable, monotonic (good for convex optimization) and easy to handle. If you consider these qualities, then i believe you can use ReLU
in place of tanh
function since they are very good alternatives of each other. But before making a choice for activation functions, you must know what are the advantages and disadvantages of your choice over others. I am shortly describing some of the activation functions and their advantages.
Sigmoid
数学表达式:sigmoid(z) = 1 / (1 + exp(-z))
一阶导数:sigmoid'(z) = -exp(-z) / 1 + exp(-z)^2
优势:
(1) Sigmoid function has all the fundamental properties of a good activation function.
Tanh
数学表达式:tanh(z) = [exp(z) - exp(-z)] / [exp(z) + exp(-z)]
一阶导数:tanh'(z) = 1 - ([exp(z) - exp(-z)] / [exp(z) + exp(-z)])^2 = 1 - tanh^2(z)
优势:
(1) Often found to converge faster in practice
(2) Gradient computation is less expensive
Hard Tanh
数学表达式:hardtanh(z) = -1 if z < -1; z if -1 <= z <= 1; 1 if z > 1
一阶导数:hardtanh'(z) = 1 if -1 <= z <= 1; 0 otherwise
优势:
(1) Computationally cheaper than Tanh
(2) Saturate for magnitudes of z greater than 1
ReLU
数学表达式:relu(z) = max(z, 0)
一阶导数:relu'(z) = 1 if z > 0; 0 otherwise
优势:
(1) Does not saturate even for large values of z
(2) Found much success in computer vision applications
泄漏的ReLU
数学表达式:leaky(z) = max(z, k dot z) where 0 < k < 1
一阶导数:relu'(z) = 1 if z > 0; k otherwise
优势:
(1) Allows propagation of error for non-positive z which ReLU doesn't
此论文介绍了一些有趣的激活功能.您可以考虑阅读它.
This paper explains some fun activation function. You may consider to read it.
这篇关于在LSTM中使用tanh的直觉是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!