Keras-“转换"训练有素的多对多模型到一对多模型(发电机) [英] Keras - "Convert" a trained many-to-many model to one-to-many model (generator)

查看:162
本文介绍了Keras-“转换"训练有素的多对多模型到一对多模型(发电机)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过 Reber语法输入(暂时不嵌入).您可以在此链接 (请忽略降价促销,因为我在带有输出的第一个版本上失败了,而且它不是最新的:)).

I'm trying to understand RNNs (not a specific one) with the Reber Grammar inputs (not embedded for now). You can find the jupyter notebook on this link (please disregard markdowns because I failed on the first version with output and it's not up-to-date :) ).

对于每个时间步,我都会提供训练的输入和预期输出(因此这是一个多对多模型).

For every timestep, I provide the input and expected output for the training (so it's a many-to-many model).

  • 输入/输出为"OneHotEncoded"(基于字符串"BTSXPVE"),因此例如

  • Input/output are "OneHotEncoded" (based on the string "BTSXPVE") so for example

  • B是[1、0、0、0、0、0、0]
  • V是[0,0,0,0,0,1,0]
  • B is [1, 0, 0, 0, 0, 0, 0]
  • V is [0, 0, 0, 0, 0, 1, 0]

  • 对于时间步长,我有长度未知的字符串(此处未进行编码以使其更清晰),例如:

  • For the timesteps, I have string with unknown lenght (not encoded here to make it clearer) for example:

    • BPVVE
    • BPVPXVPXVPXVVE
    • BPVVE
    • BPVPXVPXVPXVVE

  • 所以我决定将它们填充20个时间步.

    so I decided to pad them to 20 timesteps.

    • 对于批次,我有空.我已经生成了2048个用于训练的编码字符串和256个用于测试的字符串.

    所以我的输入张量是(2048,20,7).我的输出张量也是(2048,20,7),因为对于每个时间步,我都希望得到预测.

    So my input tensor is (2048, 20, 7). My output tensor is also (2048, 20, 7) because for every timestep I would like to get the prediction.

    因此,我训练了3个多对多模型(简单RNN,GRU和LSTM),如以下代码所示.

    So I trained 3 many-to-many models (Simple RNN, GRU and LSTM) like the following code.

    model = Sequential()
    
    model.add(LSTM(units=7, input_shape=(maxlen, 7), return_sequences=True))
    model.compile(loss='mse',
                  optimizer='Nadam',
                  metrics=['mean_squared_error'])
    
    history = model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                        epochs=1500, batch_size=1024)
    

    正如预期的那样,对于每个时间步,我都有可能获得特定值,例如(经过一些清理):

    As expected, for every timestep, I have the probability to get a specific value, for example (after a bit of cleanup) :

    B预测[0,0.622,0,0,0.401,0,0](T的60%或P的40%)

    B predict [ 0, 0.622, 0, 0, 0.401, 0, 0] (60% of having a T or 40% of having P )

    根据图形生成单词是正确的

    This is correct based on the graph to generate a word

    现在,我想使用此模型生成字符串(所以是一对多模型),但是我不知道如何保留该模型并将其用作生成器.

    Now, I would like to use this model to generate string (so a One-to-many model) but I have no idea how to keep the model and use it as generator.

    我认为只输入B的输入(填充到20个时间步长),得到结果,将B与输出的最佳索引连接起来,将其填充到20个时间步长,将需要的输入输入到NN中,依此类推.但是我很确定这不是我们应该做的方式:s

    I thought to input only the input for B (padded to 20 timesteps), get the result, concatenate the B with the best index of the output, pad it to 20 timesteps, feed the need input to the NN and so on. But I'm pretty sure this is not the way we should do it :s

    此外,我尝试输入'B'和'T'来检查输出的可能性是什么(应该是S或X),但是我得到了:

    Moreover, I tried to input 'B' and 'T' to check what is the probability of output (should be S or X) but I got :

    X = np.array([[[1,0,0,0,0,0,0], [0,1,0,0,0,0,0]]])  # [[[B, P]]]
    X = sequence.pad_sequences(X, maxlen=20)
    print(model.predict(X)[0])
    

    [0,0.106,0.587,0.1,0,0.171,0.007]

    [0, 0.106, 0.587, 0.1, 0, 0.171, 0.007]

    我了解的是,前提是T(10%),S(60%),X(10%),V(18%),但是在BT之后,我应该在X上获得更多的百分比,而在V上几乎没有任何百分比/T(因为仅在B/P之后才可能在T之后加上V和T).就像我的模型没有考虑n-1个时间步长一样.所以也许我的模型是错误的:(

    What I understand is that is predit that T(10%), S(60%), X(10%), V (18%) but after BT, I should get more percent on X and nearly none on V/T (because V and T after a T is only possible after B/P). It's like if my model didn't take in account the n-1 timesteps. So maybe my model is wrong :(

    非常感谢您的支持,

    推荐答案

    您可以将此模型重新制作为stateful=True模型.使它与timesteps=1(或None用于可变长度)一起使用.

    You can remake this model as a stateful=True model. Make it work with timesteps=1 (or None for variable length).

    重建模型:

    newModel = Sequential()
    
    newModel.add(LSTM(units=7, stateful=True,batch_input_shape=(1,1,7), return_sequences=True))
    

    从其他模型获取权重:

    newModel.set_weights(model.get_weights())
    

    在预测中使用模型:

    现在,使用此模型,您一次只能输入一个步骤.而且,每次输入新序列时,您都必须小心reset_states():

    Now, with this model, you must input only one step at once. And you must be careful to reset_states() every time you're going to input a new sequence:

    所以,假设我们有起始字母B.

    So, suppose we've got the starting letter B.

    startingLetter = oneHotForBWithShape((1,1,7))
    
    
    #we are starting a new "sentence", so, let's reset states:
    newModel.reset_states()
    
    #now the prediction loop:
    nextLetter = startingLetter
    while nextLetter != endLetter:
        nextLetter = newModel.predict(nextLetter)
        nextLetter = chooseOneFromTheProbabilities(nextLetter)
    


    关于结果的质量....也许您的模型太小了.


    About the quality of the results.... maybe your model is just too tiny for that.

    您可以尝试更多的图层,例如:

    You cay try more layers, for instance:

    model = Sequential()
    
    model.add(LSTM(units=50, input_shape=(maxlen, 7), return_sequences=True))
    model.add(LSTM(units=30, return_sequences=True))
    model.add(LSTM(units=7, return_sequences=True))
    

    这个选择是任意的,我不知道它对您的数据是否足够好.

    This choice was arbitrary, I don't know if it's good enough or too good for your data.

    这篇关于Keras-“转换"训练有素的多对多模型到一对多模型(发电机)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆