在LSTM中使用tanh的直觉是什么 [英] What is the intuition of using tanh in LSTM

查看:1074
本文介绍了在LSTM中使用tanh的直觉是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在LSTM网络(了解LSTM )中,为什么输入门和输出门使用tanh?这背后的直觉是什么? 这只是一个非线性变换? 如果是,我可以同时将其更改为另一个激活功能(例如ReLU)吗?

In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I change both to another activation function (e.g. ReLU)?

推荐答案

Sigmoid专门用作LSTM中3个门(进,出,忘记)的门控功能,因为它输出的值介于0和1,它要么使整个闸门不流通,要么使信息流通不畅.另一方面,要克服消失的梯度问题,我们需要一个函数,该函数的二阶导数在变为零之前可以维持很长的范围. Tanh是具有上述属性的良好功能.

Sigmoid specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM, since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero. Tanh is a good function with the above property.

一个好的神经元单元应该是有界的,易于区分的,单调的(有利于凸优化)并且易于处理.如果您考虑这些品质,那么我相信您可以使用ReLU代替tanh函数,因为它们是彼此很好的替代品.但是在选择激活功能之前,您必须知道选择相对于其他功能的优缺点是什么.我将简要介绍一些激活功能及其优势.

A good neuron unit should be bounded, easily differentiable, monotonic (good for convex optimization) and easy to handle. If you consider these qualities, then i believe you can use ReLU in place of tanh function since they are very good alternatives of each other. But before making a choice for activation functions, you must know what are the advantages and disadvantages of your choice over others. I am shortly describing some of the activation functions and their advantages.

Sigmoid

数学表达式:sigmoid(z) = 1 / (1 + exp(-z))

一阶导数:sigmoid'(z) = -exp(-z) / 1 + exp(-z)^2

优势:

(1) Sigmoid function has all the fundamental properties of a good activation function.

Tanh

数学表达式:tanh(z) = [exp(z) - exp(-z)] / [exp(z) + exp(-z)]

一阶导数:tanh'(z) = 1 - ([exp(z) - exp(-z)] / [exp(z) + exp(-z)])^2 = 1 - tanh^2(z)

优势:

(1) Often found to converge faster in practice
(2) Gradient computation is less expensive

Hard Tanh

数学表达式:hardtanh(z) = -1 if z < -1; z if -1 <= z <= 1; 1 if z > 1

一阶导数:hardtanh'(z) = 1 if -1 <= z <= 1; 0 otherwise

优势:

(1) Computationally cheaper than Tanh
(2) Saturate for magnitudes of z greater than 1

ReLU

数学表达式:relu(z) = max(z, 0)

一阶导数:relu'(z) = 1 if z > 0; 0 otherwise

优势:

(1) Does not saturate even for large values of z
(2) Found much success in computer vision applications

泄漏的ReLU

数学表达式:leaky(z) = max(z, k dot z) where 0 < k < 1

一阶导数:relu'(z) = 1 if z > 0; k otherwise

优势:

(1) Allows propagation of error for non-positive z which ReLU doesn't

论文介绍了一些有趣的激活功能.您可以考虑阅读它.

This paper explains some fun activation function. You may consider to read it.

这篇关于在LSTM中使用tanh的直觉是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆