Keras序列模型-如何在测试/生成过程中生成数据? [英] Keras sequence models - how to generate data during test/generation?

查看:108
本文介绍了Keras序列模型-如何在测试/生成过程中生成数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以使用已经训练好的RNN(SimpleRNN或LSTM)模型在Keras中生成新序列?

Is there a way to use the already trained RNN (SimpleRNN or LSTM) model to generate new sequences in Keras?

我正在尝试修改Coursera深度学习专业课程的练习-序列模型课程,您将在其中训练RNN以生成恐龙的名字.在练习中,您仅使用numpy构建RNN,但我想使用Keras.

I'm trying to modify an exercise from the Coursera Deep Learning Specialization - Sequence Models course, where you train an RNN to generate dinosaurus's names. In the exercise you build the RNN using only numpy, but I want to use Keras.

问题之一是序列的长度(恐龙名称)不同,因此我使用填充并将序列长度设置为数据集中出现的最大大小(我用0填充,这也是' \ n').

One of the problems is different lengths of the sequences (dino names), so I used padding and set sequence length to the max size appearing in the dataset (I padded with 0, which is also the code for '\n').

我的问题是训练完成后如何生成实际序列?在该练习的numpy版本中,您将获取上一个单元格的softmax输出,并将其用作分布以对下一个单元格的新输入进行采样.但是,在测试/生成期间,是否有办法将前一个单元的输出连接为Keras中下一个单元的输入?

My question is how to generate the actual sequence once training is done? In the numpy version of the exercise you take the softmax output of the previous cell and use it as a distribution to sample a new input for the next cell. But is there a way to connect the output of the previous cell as the input of the next cell in Keras, during testing/generation time?

也-一些其他附带问题:

Also - some additional side-question:

  • 由于我使用填充,我怀疑准确性太乐观了.有没有办法告诉Keras在其精度计算中不要包括填充值?

我什至这样做对吗?有没有更好的方法来将Keras与不同长度的序列一起使用?

Am I even doing this right? Is there a better way to use Keras with sequences of different lengths?

您可以在此处查看我的(WIP)代码.

推荐答案

从经过序列训练的模型中推断出

因此在RNN模型和Keras中这样做是很常见的,最好的方法(至少据我所知)是创建两个不同的模型.

Inferring from a model that has been trained on a sequence

So it's a pretty common thing to do in RNN models and in Keras the best way (at least from what I know) is to create two different models.

  • 一种训练模型(使用序列而不是单个项目)
  • 另一种预测模型(使用单个元素而不是序列)

所以让我们看一个例子.假设您具有以下模型.

So let's see an example. Suppose you have the following model.

from tensorflow.keras import models, layers

n_chars = 26
timesteps = 10
inp = layers.Input(shape=(timesteps,  n_chars))
lstm = layers.LSTM(100, return_sequences=True)
out1 = lstm(inp)
dense = layers.Dense(n_chars, activation='softmax')
out2 = layers.TimeDistributed(dense)(out1)
model = models.Model(inp, out2)
model.summary()

现在可以从此模型进行推断,您可以创建另一个模型,如下所示.

Now to infer from this model, you create another model which looks like the one below.

inp_infer = layers.Input(shape=(1, n_chars))
# Inputs to feed LSTM states back in
h_inp_infer = layers.Input(shape=(100,))
c_inp_infer = layers.Input(shape=(100,))
# We need return_state=True so we are creating a new layer
lstm_infer = layers.LSTM(100, return_state=True, return_sequences=True)
out1_infer, h, c  = lstm_infer(inp_infer, initial_state=[h_inp_infer, c_inp_infer])
out2_infer = layers.TimeDistributed(dense)(out1_infer)

# Our model takes the previous states as inputs and spits out new states as outputs
model_infer = models.Model([inp_infer, h_inp_infer, c_inp_infer], [out2_infer, h, c])

# We are setting the weights from the trained model
lstm_infer.set_weights(lstm.get_weights())
model_infer.summary()

那么有什么不同.您会看到我们已经定义了一个新的输入层,该层接受一个只有一个时间步长(或换句话说,只有一个项目)的输入.然后,模型输出一个具有单个时间步长的输出(从技术上讲,我们不需要TimeDistributedLayer.但是为了保持一致性,我将其保留了下来).除此之外,我们将以前的LSTM状态输出作为输入,并产生新的状态作为输出.更具体地说,我们具有以下推理模型.

So what's different. You see that we have defined a new input layer which accepts an input which has only one timestep (or in other words, just a single item). Then the model outputs an output which has a single timestep (technically we don't need the TimeDistributedLayer. But I've kept that for consistency). Other than that we take the previous LSTM state output as an input and produces the new state as the output. More specifically we have the following inference model.

  • 输入:[(None, 1, n_chars) (None, 100), (None, 100)]张量列表
  • 输出:[(None, 1, n_chars), (None, 100), (None, 100)]张量列表
  • Input: [(None, 1, n_chars) (None, 100), (None, 100)] list of tensor
  • Output: [(None, 1, n_chars), (None, 100), (None, 100)] list of Tensor

请注意,我将从训练模型中更新新图层的权重,或者使用训练模型中的现有图层.如果您不重用训练有素的图层和权重,它将是一个非常无用的模型.

现在我们可以编写推理逻辑了.

Now we can write inference logic.

import numpy as np
x = np.random.randint(0,2,size=(1, 1, n_chars))
h = np.zeros(shape=(1, 100))
c = np.zeros(shape=(1, 100))
seq_len = 10
for _ in range(seq_len):
  print(x)
  y_pred, h, c = model_infer.predict([x, h, c])
  y_pred = x[:,0,:]
  y_onehot = np.zeros(shape=(x.shape[0],n_chars))
  y_onehot[np.arange(x.shape[0]),np.argmax(y_pred,axis=1)] = 1.0
  x = np.expand_dims(y_onehot, axis=1)

此部分以初始x, h, c开头.获取预测y_pred, h, c,并将其转换为以下各行的输入,并将其分配回x, h, c.因此,您可以继续进行选择的n迭代.

This part starts with an initial x, h, c. Gets the prediction y_pred, h, c and convert that to an input in the following lines and assign it back to x, h, c. So you keep going for n iterations of your choice.

Keras确实提供了 Masking 层,可用于此目的.而问题中的第二个答案似乎就是你正在寻找.

Keras does offer a Masking layer which can be used for this purpose. And the second answer in this question seems to be what you're looking for.

这篇关于Keras序列模型-如何在测试/生成过程中生成数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆