恢复keras seq2seq模型 [英] Restore keras seq2seq model

查看:70
本文介绍了恢复keras seq2seq模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里使用keras seq2seq示例: https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py

I'm working with the keras seq2seq example here: https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py

我想保留词汇表和解码器,以便以后可以再次加载它,并将其应用于新序列.

I would like to persist the vocabulary and decoder so that I can load it again later, and apply it to new sequences.

尽管代码调用了model.save(),但这还不够,因为我可以看到解码设置引用了许多其他变量,这些变量是进入训练模型的深层指针:

While the code calls model.save(), this is insufficient because I can see the decoding setup referencing a number of other variables which are deep pointers into the trained model:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

我想翻译这段代码,以确定从磁盘加载的模型中的encoder_inputs,encoder_states,latent_dim,decoder_inputs.可以假设我事先知道模型架构.有没有简单的方法可以做到这一点?

I would like to translate this code to determine encoder_inputs, encoder_states, latent_dim, decoder_inputs from a model loaded from disk. It's ok to assume I know the model architecture in advance. Is there a straightforward way to do this?

更新: 在使用解码器构造代码并根据需要提取层输入/输出方面,我已经取得了一些进展.

Update: I have made some progress using the decoder construction code and pulling out the layer inputs/outputs as needed.

encoder_inputs = model.input[0] #input_1
decoder_inputs = model.input[1] #input_2
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
_, state_h_dec, state_c_dec = model.layers[3].output # lstm_2
decoder_outputs = model.layers[4].output # dense_1

encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

latent_dim = 256 # TODO: infer this from the model. Should match lstm_1 outputs.

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_states = [state_h_dec, state_c_dec]

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

但是,当我尝试构建解码器模型时,遇到此错误:

However, when I try to construct the decoder model, I encounter this error:

RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, ?, 96), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []

作为测试,我尝试了具有相同结果的Model(decoder_inputs,decoder_outputs).对我来说不清楚是什么断开了与图的连接,因为这些层是从模型中加载的.

As a test I tried Model(decoder_inputs,decoder_outputs) with the same result. It's not clear to me what is disconnected from the graph, since these layers are loaded from the model.

推荐答案

好,我解决了这个问题,并且解码器产生了合理的结果.在上面的代码中,我错过了解码器步骤中的几个细节,特别是call()s LSTM和Dense层以便将它们连接起来.此外,新的解码器输入需要唯一的名称,因此它们不会与input_1和input_2冲突(此细节闻起来像是keras的错误).

Ok, I solved this problem and the decoder is producing reasonable results. In my code above I missed a couple details in the decoder step, specifically that it call()s the LSTM and Dense layers in order to wire them up. In addition, the new decoder inputs need unique names so they don't collide with input_1 and input_2 (this detail smells like a keras bug).

encoder_inputs = model.input[0] #input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

decoder_inputs = model.input[1] #input_2
decoder_state_input_h = Input(shape=(latent_dim,),name='input_3')
decoder_state_input_c = Input(shape=(latent_dim,),name='input_4')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4]
decoder_outputs=decoder_dense(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

此代码的一个主要缺点是我们事先了解完整的体系结构.我希望最终能够加载与体系结构无关的解码器.

A big drawback with this code is the fact we know the full architecture in advance. I would like to eventually be able to load an architecture-agnostic decoder.

这篇关于恢复keras seq2seq模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆