在嵌入层之后应用退出层与通过LSTM退出参数应用退出是否具有相同的效果? [英] Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

查看:95
本文介绍了在嵌入层之后应用退出层与通过LSTM退出参数应用退出是否具有相同的效果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对在Keras中对我的顺序模型应用辍学的不同方法有些困惑.

我的模型如下:

model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

假设我以下面的方式在Embedding层之后添加了一个额外的Dropout层:

model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(Dropout(0.25))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

这会有所不同吗,因为我随后在LSTM参数中指定了dropout应该为0.5,还是我弄错了?

解决方案

添加辍学层时,您仅在前一层的输出中添加了辍学,在这种情况下,您是在嵌入层中添加了辍学. /p>

LSTM单元比单层神经网络更复杂,当您在LSTM单元中指定退出时,实际上是将退出应用于LSTM单元中的4个不同的子神经网络操作.

下面是Colah关于LSTM的博客中的LSMT单元的可视化(目前最好的LSTM/RNN可视化, 解决方案

When you add a dropout layer you're adding dropout to the output of the previous layer only, in your case you are adding dropout to your embedding layer.

An LSTM cell is more complex than a single layer neural network, when you specify the dropout in the LSTM cell you are actually applying dropout to 4 different sub neural network operations in the LSTM cell.

Below is a visualization of an LSMT cell from Colah's blog on LSTMs (the best visualization of LSTM/RNNs out there, http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The yellow boxes represent 4 fully connected network operations (each with their own weights) which occur under the hood of the LSTM - this is neatly wrapped up in the LSTM cell wrapper, though it's not really so hard to code by hand.

When you specify dropout=0.5 in the LSTM cell, what you are doing under the hood is applying dropout to each of these 4 neural network operations. This is effectively adding model.add(Dropout(0.25)) 4 times, once after each of the 4 yellow blocks you see in the diagram, within the internals of the LSTM cell.

I hope that short discussion makes it more clear how the dropout applied in the LSTM wrapper, which is applied to effectively 4 sub networks within the LSTM, is different from the dropout you applied once in the sequence after your embedding layer. And to answer your question directly, yes, these two dropout definitions are very much different.

Notice, as a further example to help elucidate the point: if you were to define a simple 5 layer fully connected neural network you would need to define dropout after each layer, not once. model.add(Dropout(0.25)) is not some kind of global setting, it's adding the dropout operation to a pipeline of operations. If you have 5 layers, you need to add 5 dropout operations.

这篇关于在嵌入层之后应用退出层与通过LSTM退出参数应用退出是否具有相同的效果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆