如何在Keras中仅获取序列模型的最后输出? [英] How to get only last output of sequence model in Keras?
问题描述
我在Keras中使用 return_sequences = True
和 TimeDistributed
包装器在Keras中训练了多对多序列模型最后一个密集层:
I trained a Many-to-Many sequence model in Keras with return_sequences=True
and TimeDistributed
wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
因此,在训练过程中,损失是针对所有隐藏状态(在每个时间戳记中)进行计算的。但出于推断,我只需要在最后一个时间戳记上获取输出。因此,我将权重加载到多对一序列模型中,以进行 TimeDistributed
包装器的推理,并设置 return_sequences = False
以获得LSTM层的最后输出:
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed
wrapper and I set return_sequences=False
to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
当我在序列上测试推理模型时长度为20的我希望获得形状为(vocab_size)的预测,但 inference_model.predict(...)
仍会为每个时间戳返回预测-形状的张量(20, vocab_size)
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...)
still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
推荐答案
如果出于某种原因,您只需要在推理过程中使用最后一个时间步,就可以构建一个新模型输入的训练模型并返回最后一个使用 Lambda
层作为时间步的输出:
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda
layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
侧面说明:如此答案, TimeDistributed(Dense(...))
和 Dense(...)
是等效的,因为 Dense
图层应用于其输入张量的最后一个维度。因此,这就是为什么您获得相同的输出形状。
Side Note: As already stated in this answer, TimeDistributed(Dense(...))
and Dense(...)
are equivalent, since Dense
layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.
这篇关于如何在Keras中仅获取序列模型的最后输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!