如何在Keras中使用return_sequences选项和TimeDistributed层? [英] How to use return_sequences option and TimeDistributed layer in Keras?

查看:263
本文介绍了如何在Keras中使用return_sequences选项和TimeDistributed层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的对话语料库.我想实现一个LSTM模型,该模型可以预测系统动作.系统动作被描述为位向量.用户输入被计算为词嵌入,这也是位向量.

I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector.

t1: user: "Do you know an apple?", system: "no"(action=2)
t2: user: "xxxxxx", system: "yyyy" (action=0)
t3: user: "aaaaaa", system: "bbbb" (action=5)

所以我要实现的是多对多(2)"模型.当我的模型收到用户输入时,它必须输出系统操作. 但是我不了解LSTM之后的return_sequences选项和TimeDistributed层.要实现多对多(2)",需要return_sequences==True并在LSTM之后添加TimeDistributed吗?如果您能对它们进行更多描述,我将不胜感激.

So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action. But I cannot understand return_sequences option and TimeDistributed layer after LSTM. To realize "many-to-many (2)", return_sequences==True and adding a TimeDistributed after LSTMs are required? I appreciate if you would give more description of them.

return_sequences :布尔值.是返回输出序列中的最后一个输出还是完整序列.

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.

TimeDistributed :此包装器允许将图层应用于输入的每个时间片.

TimeDistributed: This wrapper allows to apply a layer to every temporal slice of an input.

更新于2017/03/13 17:40

我想我可以理解return_sequence选项.但是我仍然不确定TimeDistributed.如果我在LSTM之后添加TimeDistributed,该模型是否与下面的我的多对多(2)"相同?因此,我认为密集层适用于每个输出.

Updated 2017/03/13 17:40

I think I could understand the return_sequence option. But I am not still sure about TimeDistributed. If I add a TimeDistributed after LSTMs, is the model the same as "my many-to-many(2)" below? So I think Dense layers are applied for each output.

推荐答案

LSTM层和TimeDistributed包装器是获得所需的多对多"关系的两种不同方法.

The LSTM layer and the TimeDistributed wrapper are two different ways to get the "many to many" relationship that you want.

  1. LSTM会逐个吃掉您的句子中的单词,您可以通过"return_sequence"选择在每个步骤(处理完每个单词之后)输出某项内容(状态),或者仅在吃完最后一个单词之后输出某些内容.因此,在return_sequence = TRUE的情况下,输出将是相同长度的序列,在return_sequence = FALSE的情况下,输出将只是一个向量.
  2. TimeDistributed.这个包装器可让您独立将一层(例如说密集")应用于序列的每个元素.该层的每个元素的权重完全相同,将应用于每个单词的权重相同,并且当然,它将返回独立处理的单词序列.
  1. LSTM will eat the words of your sentence one by one, you can chose via "return_sequence" to outuput something (the state) at each step (after each word processed) or only output something after the last word has been eaten. So with return_sequence=TRUE, the output will be a sequence of the same length, with return_sequence=FALSE, the output will be just one vector.
  2. TimeDistributed. This wrapper allows you to apply one layer (say Dense for example) to every element of your sequence independently. That layer will have exactly the same weights for every element, it's the same that will be applied to each words and it will, of course, return the sequence of words processed independently.

如您所见,两者之间的区别在于LSTM通过序列传播信息,它将吃掉一个单词,更新其状态并返回或不返回它.然后它将继续下一个单词,同时仍然携带来自先前信息的信息....就像在TimeDistributed中一样,单词将以相同的方式自行处理,就好像它们在筒仓中一样,并且每个单词都使用相同的层.

As you can see, the difference between the two is that the LSTM "propagates the information through the sequence, it will eat one word, update its state and return it or not. Then it will go on with the next word while still carrying information from the previous ones.... as in the TimeDistributed, the words will be processed in the same way on their own, as if they were in silos and the same layer applies to every one of them.

因此您不必连续使用LSTM和TimeDistributed,您可以做任何您想做的事情,只要记住它们各自的作用即可.

So you dont have to use LSTM and TimeDistributed in a row, you can do whatever you want, just keep in mind what each of them do.

我希望它更清晰?

在您的情况下,分配的时间将密集层应用于LSTM输出的每个元素.

The time distributed, in your case, applies a dense layer to every element that was output by the LSTM.

让我们举个例子:

您有一个嵌入在emb_size维度中的n_words单词序列.因此,您输入的是形状为(n_words, emb_size)

You have a sequence of n_words words that are embedded in emb_size dimensions. So your input is a 2D tensor of shape (n_words, emb_size)

首先,您应用LSTM,其输出尺寸分别为lstm_outputreturn_sequence = True.输出仍将是顺次,因此它将是形状为(n_words, lstm_output)的2D张量. 因此,您具有长度为lstm_output的n_words个向量.

First you apply an LSTM with output dimension = lstm_output and return_sequence = True. The output will still be a squence so it will be a 2D tensor of shape (n_words, lstm_output). So you have n_words vectors of length lstm_output.

现在,您应用一个TimeDistributed密集层,并输出3维作为Dense的参数.因此,TimeDistributed(Dense(3)). 这会将Dense(3)次n_words次应用于序列中每个大小为lstm_output的向量...它们将全部成为长度为3的向量.您的输出仍将是一个序列,因此现在为2D张量,形状为(n_words, 3) .

Now you apply a TimeDistributed dense layer with say 3 dimensions output as parameter of the Dense. So TimeDistributed(Dense(3)). This will apply Dense(3) n_words times, to every vectors of size lstm_output in your sequence independently... they will all become vectors of length 3. Your output will still be a sequence so a 2D tensor, of shape now (n_words, 3).

更清晰吗? :-)

这篇关于如何在Keras中使用return_sequences选项和TimeDistributed层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆