LSTM如何处理可变长度序列 [英] How LSTM deal with variable length sequence

查看:474
本文介绍了LSTM如何处理可变长度序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用Python进行深度学习,如下所示:

from keras.models import Model
from keras import layers
from keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# Our text input is a variable-length sequence of integers.
# Note that we can optionally name our inputs!
text_input = Input(shape=(None,), dtype='int32', name='text')

# Which we embed into a sequence of vectors of size 64
embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)

# Which we encoded in a single vector via a LSTM
encoded_text = layers.LSTM(32)(embedded_text)

# Same process (with different layer instances) for the question
question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

# We then concatenate the encoded question and encoded text
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)

# And we add a softmax classifier on top
answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)

# At model instantiation, we specify the two inputs and the output:
model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['acc'])

如您所见,该模型的输入没有原始数据的形状信息,那么在Embedding层之后,LSTM的输入或Embedding的输出是一些可变长度的序列.

as you see this model's input don't have raw data's shape information, then after Embedding layer, the input of LSTM or the output of Embedding are some variable length sequence.

所以我想知道:

  • 在此模型中,如何通过keras确定LSTM层中的lstm_unit数量
  • 如何处理可变长度序列

其他信息:为了解释lstm_unit是什么(我不知道如何称呼它,因此只显示它的图像):

Additional information: in order to explain what lstm_unit is (I don't know how to call it,so just show it image):

推荐答案

所提供的循环层继承自基本实现keras.layers.Recurrent,其中包括选项return_sequences,该选项默认为False.这意味着默认情况下,循环图层将消耗可变长度的输入,并最终仅在最后的顺序步骤中生成图层的输出.

The provided recurrent layers inherit from a base implementation keras.layers.Recurrent, which includes the option return_sequences, which defaults to False. What this means is that by default, recurrent layers will consume variable-length inputs and ultimately produce only the layer's output at the final sequential step.

因此,使用None指定可变长度输入序列维没有问题.

As a result, there is no problem using None to specify a variable-length input sequence dimension.

但是,如果您希望该层返回完整的输出序列,即输入序列每一步的输出张量,那么您将不得不进一步处理该输出的可变大小.

However, if you wanted the layer to return the full sequence of output, i.e. the tensor of outputs for each step of the input sequence, then you'd have to further deal with the variable size of that output.

您可以通过让下一层进一步接受可变大小的输入来进行此操作,然后对问题进行平整,直到以后在网络中最终必须从某些可变长度的事物中计算损失函数,或者进行计算一些固定长度的表示形式,然后再继续到后面的层,具体取决于您的模型.

You could do this by having the next layer further accept a variable-sized input, and punt on the problem until later on in your network when eventually you either must calculate a loss function from some variable-length thing, or else calculate some fixed-length representation before continuing on to later layers, depending on your model.

或者您可以通过要求固定长度的序列来做到这一点,可能是在序列的末尾填充特殊的前哨值,这些值仅指示纯粹用于填充长度的空序列项.

Or you could do it by requiring fixed-length sequences, possibly with padding the end of the sequences with special sentinel values that merely indicate an empty sequence item purely for padding out the length.

Embedding层是一个非常特殊的层,它也可以处理可变长度的输入.对于输入序列的每个标记,输出形状将具有不同的嵌入矢量,因此形状将为(批大小,序列长度,嵌入尺寸).由于下一层是LSTM,所以这没问题……它也会很高兴地使用可变长度序列.

Separately, the Embedding layer is a very special layer that is built to handle variable length inputs as well. The output shape will have a different embedding vector for each token of the input sequence, so the shape with be (batch size, sequence length, embedding dimension). Since the next layer is LSTM, this is no problem ... it will happily consume variable-length sequences as well.

但是正如Embedding文档中所提到的:

But as it is mentioned in the documentation on Embedding:

input_length: Length of input sequences, when it is constant.
      This argument is required if you are going to connect
      `Flatten` then `Dense` layers upstream
      (without it, the shape of the dense outputs cannot be computed).

如果要直接从Embedding转到非可变长度表示形式,则必须提供固定的序列长度作为图层的一部分.

If you want to go directly from Embedding to a non-variable-length representation, then you must supply the fixed sequence length as part of the layer.

最后,请注意,当您表示LSTM层的尺寸(例如LSTM(32))时,是在描述该层的输出空间的尺寸.

Finally, note that when you express the dimensionality of the LSTM layer, such as LSTM(32), you are describing the dimensionality of the output space of that layer.

# example sequence of input, e.g. batch size is 1.
[
 [34], 
 [27], 
 ...
] 
--> # feed into embedding layer

[
  [64-d representation of token 34 ...],
  [64-d representation of token 27 ...],
  ...
] 
--> # feed into LSTM layer

[32-d output vector of the final sequence step of LSTM]

为了避免批次大小为1的效率低下,一种策略是按照每个示例的序列长度对输入的训练数据进行排序,然后根据常见序列长度将其分组,例如使用自定义Keras DataGenerator.

In order to avoid the inefficiency of a batch size of 1, one tactic is to sort your input training data by the sequence length of each example, and then group into batches based on common sequence length, such as with a custom Keras DataGenerator.

这具有允许大批量的优点,尤其是在您的模型可能需要诸如批量标准化或需要GPU密集型训练的模型时,甚至只是为了减少批量更新的梯度的嘈杂估计而带来的好处.但是它仍然可以让您处理输入训练数据集,该数据集对于不同的示例具有不同的批处理长度.

This has the advantage of allowing large batch sizes, especially if your model may need something like batch normalization or involves GPU-intensive training, and even just for the benefit of a less noisy estimate of the gradient for batch updates. But it still lets you work on an input training data set that has different batch lengths for different examples.

但是,更重要的是,它还有一个很大的优点,您无需管理任何填充即可确保输入中的公共序列长度.

More importantly though, it also has the big advantage that you do not have to manage any padding to ensure common sequence lengths in the input.

这篇关于LSTM如何处理可变长度序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆