两种序列模型之间的差异keras(带有和不带有RepeatVector) [英] Difference between two Sequence to Sequence Models keras (with and without RepeatVector)
问题描述
我试图了解在此处,以下内容:
I try to understand what the difference between this model describde here, the following one:
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
,这里描述的序列到序列模型是 第二项描述
and the sequence to sequence model described here is second describion
有什么区别?第一个没有RepeatVector,而第二个没有RepeatVector?第一个模型是否不将解码器的隐藏状态作为预测的初始状态?
What is the difference ? The first one has the RepeatVector while the second does not have that? Is the first model not taking the decoders hidden state as inital state for the prediction ?
有没有一篇描述第一篇和第二篇的论文?
Are there a paper describing the first and second one ?
推荐答案
在使用RepeatVector
的模型中,他们没有使用任何形式的花式预测,也没有处理状态.他们让模型在内部完成所有操作,并使用RepeatVector
将(batch, latent_dim)
向量(不是序列)转换为(batch, timesteps, latent_dim)
(现在是适当的序列).
In the model using RepeatVector
, they're not using any kind of fancy prediction, nor dealing with states. They're letting the model do everything internally and the RepeatVector
is used to transform a (batch, latent_dim)
vector (which is not a sequence) into a (batch, timesteps, latent_dim)
(which is now a proper sequence).
现在,在另一个没有RepeatVector
的模型中,秘密在于此附加功能:
Now, in the other model, without RepeatVector
, the secret lies in this additional function:
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
这会基于stop_condition
运行一个循环",以一个一个地创建时间步长. (这样做的好处是使句子没有固定的长度).
This runs a "loop" based on a stop_condition
for creating the time steps one by one. (The advantage of this is making sentences without a fixed length).
它还显式地获取每个步骤中生成的状态(以保持每个单独步骤之间的正确连接).
It also explicitly takes the states generated in each step (in order to keep the proper connection between each individual step).
- 模型1:通过重复潜在矢量来创建长度
- 模型2:通过循环执行新的步骤直到达到停止条件来创建长度
这篇关于两种序列模型之间的差异keras(带有和不带有RepeatVector)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!