使用LSTM教程代码预测句子中的下一个单词吗? [英] Use LSTM tutorial code to predict next word in a sentence?

查看:82
本文介绍了使用LSTM教程代码预测句子中的下一个单词吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用 https://www.tensorflow.org/tutorials了解示例代码/循环 您可以在 https://github.com/上找到tensorflow/模型/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

(使用张量流1.3.0.)

(Using tensorflow 1.3.0.)

对于我的问题,我在下面总结了(我认为是)关键部分:

I've summarized (what I think are) the key parts, for my question, below:

 size = 200
 vocab_size = 10000
 layers = 2
 # input_.input_data is a 2D tensor [batch_size, num_steps] of
 #    word ids, from 1 to 10000

 cell = tf.contrib.rnn.MultiRNNCell(
    [tf.contrib.rnn.BasicLSTMCell(size) for _ in range(2)]
    )

 embedding = tf.get_variable(
      "embedding", [vocab_size, size], dtype=tf.float32)
 inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

inputs = tf.unstack(inputs, num=num_steps, axis=1)
outputs, state = tf.contrib.rnn.static_rnn(
    cell, inputs, initial_state=self._initial_state)

output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable(
    "softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
logits = tf.matmul(output, softmax_w) + softmax_b

# Then calculate loss, do gradient descent, etc.

我最大的问题是在给出句子的前几个单词的情况下,如何使用产生的模型来实际生成下一个单词建议?具体来说,我认为流程是这样的,但是我无法理解注释行的代码是什么:

My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence? Concretely, I imagine the flow is like this, but I cannot get my head around what the code for the commented lines would be:

prefix = ["What", "is", "your"]
state = #Zeroes
# Call static_rnn(cell) once for each word in prefix to initialize state
# Use final output to set a string, next_word
print(next_word)

我的子问题是:

  • 为什么要使用随机(未经初始化,未经训练)的词嵌入?
  • 为什么要使用softmax?
  • 隐藏层是否必须匹配输入的尺寸(即word2vec嵌入的尺寸)
  • 如何/可以引入经过预训练的word2vec模型,而不是未初始化的模型?

(我想将它们全部作为一个问题,因为我怀疑它们都是相互联系的,并且与我的理解存在一定差距.)

(I'm asking them all as one question, as I suspect they are all connected, and connected to some gap in my understanding.)

我希望在这里看到的是加载现有的word2vec集词嵌入(例如,使用gensim的KeyedVectors.load_word2vec_format()),在加载每个句子时将输入语料库中的每个单词转换为该表示形式,然后执行LSTM吐出一个相同维数的向量,我们将尝试找到最相似的词(例如,使用gensim的similar_by_vector(y, topn=1)).

What I was expecting to see here was loading an existing word2vec set of word embeddings (e.g. using gensim's KeyedVectors.load_word2vec_format()), convert each word in the input corpus to that representation when loading in each sentence, and then afterwards the LSTM would spit out a vector of the same dimension, and we would try and find the most similar word (e.g. using gensim's similar_by_vector(y, topn=1)).

使用softmax是否可以将我们从相对较慢的similar_by_vector(y, topn=1)通话中救出来?

Is using softmax saving us from the relatively slow similar_by_vector(y, topn=1) call?

顺便说一句,对于我的问题中预先存在的word2vec部分使用预先训练的word2vec和LSTM进行单词生成是相似的.但是,目前那里的答案不是我想要的.我所希望的是一个简单的英语解释,该解释为我打开了灯,并填补了我所理解的任何空白. 在lstm语言模型中使用经过预训练的word2vec吗?是另一个类似的问题.

BTW, for the pre-existing word2vec part of my question Using pre-trained word2vec with LSTM for word generation is similar. However the answers there, currently, are not what I'm looking for. What I'm hoping for is a plain English explanation that switches the light on for me, and plugs whatever the gap in my understanding is.  Use pre-trained word2vec in lstm language model? is another similar question.

更新:使用语言模型张量流示例预测下一个单词 https://stackoverflow.com/a/39282697/841830 (它带有一个github分支),但不能使两者都运行且没有错误.我认为它们可能适用于TensorFlow的早期版本?

UPDATE: Predicting next word using the language model tensorflow example and Predicting the next word using the LSTM ptb model tensorflow example are similar questions. However, neither shows the code to actually take the first few words of a sentence, and print out its prediction of the next word. I tried pasting in code from the 2nd question, and from https://stackoverflow.com/a/39282697/841830 (which comes with a github branch), but cannot get either to run without errors. I think they may be for an earlier version of TensorFlow?

另一个更新:另一个问题基本相同:根据Tensorflow预测LSTM模型的下一个词例子 它链接到 使用语言模型张量流示例预测下一个单词(同样,答案并不完全是我想要的)

ANOTHER UPDATE: Yet another question asking basically the same thing: Predicting Next Word of LSTM Model from Tensorflow Example It links to Predicting next word using the language model tensorflow example (and, again, the answers there are not quite what I am looking for).

如果仍然不清楚,我正在尝试编写一个称为getNextWord(model, sentencePrefix)的高级函数,其中model是我从磁盘加载的先前构建的LSTM,而sentencePrefix是字符串,例如"Open the",它可能会返回"pod".然后,我可能会用打开容器"来称呼它,它将返回"bay",依此类推.

In case it still isn't clear, what I am trying to write a high-level function called getNextWord(model, sentencePrefix), where model is a previously built LSTM that I've loaded from disk, and sentencePrefix is a string, such as "Open the", and it might return "pod". I then might call it with "Open the pod" and it will return "bay", and so on.

一个示例(具有字符RNN,并使用mxnet)是sample()函数/master/chapter05_recurrent-neural-networks/simple-rnn.ipynb"rel =" noreferrer> https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter05_recurrent-neural-networks/simple -rnn.ipynb 您可以在训练过程中呼叫sample(),但也可以在训练后以任意句子呼叫它.

An example (with a character RNN, and using mxnet) is the sample() function shown near the end of https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter05_recurrent-neural-networks/simple-rnn.ipynb You can call sample() during training, but you can also call it after training, and with any sentence you want.

推荐答案

我的最大问题是,给定一个句子的前几个单词,我如何使用产生的模型来实际生成下一个单词建议?

My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence?

即我正在尝试编写一个带有签名的函数:getNextWord(model,statementPrefix)

I.e. I'm trying to write a function with the signature: getNextWord(model, sentencePrefix)

在我解释我的答案之前,请先对您对# Call static_rnn(cell) once for each word in prefix to initialize state的建议进行评论:请记住,static_rnn不会返回像numpy数组那样的值,而是返回张量.您可以在运行时将张量评估为一个值(1)在一个会话中(一个会话保持计算图的状态,包括模型参数的值),以及(2)使用计算所需的输入张量值.可以使用输入阅读器(本教程中的方法)或使用占位符(我将在下面使用的东西)来提供输入.

Before I explain my answer, first a remark about your suggestion to # Call static_rnn(cell) once for each word in prefix to initialize state: Keep in mind that static_rnn does not return a value like a numpy array, but a tensor. You can evaluate a tensor to a value when it is run (1) in a session (a session is keeps the state of your computional graph, including the values of your model parameters) and (2) with the input that is necessary to calculate the tensor value. Input can be supplied using input readers (the approach in the tutorial), or using placeholders (what I will use below).

现在按照实际答案: 本教程中的模型旨在读取文件中的输入数据. @ user3080953的答案已经显示了如何使用您自己的文本文件,但是据我了解,您需要对如何将数据馈送到模型进行更多控制.为此,您需要定义自己的占位符,并在调用session.run()时将数据提供给这些占位符.

Now follows the actual answer: The model in the tutorial was designed to read input data from a file. The answer of @user3080953 already showed how to work with your own text file, but as I understand it you need more control over how the data is fed to the model. To do this you will need to define your own placeholders and feed the data to these placeholders when calling session.run().

在下面的代码中,我将PTBModel子类化,并使其负责将数据显式馈送到模型.我介绍了一种特殊的PTBInteractiveInput,其接口类似于PTBInput,因此您可以重用PTBModel中的功能.要训​​练模型,您仍然需要PTBModel.

In the code below I subclassed PTBModel and made it responsible for explicitly feeding data to the model. I introduced a special PTBInteractiveInput that has an interface similar to PTBInput so you can reuse the functionality in PTBModel. To train your model you still need PTBModel.

class PTBInteractiveInput(object):
  def __init__(self, config):
    self.batch_size = 1
    self.num_steps = config.num_steps
    self.input_data = tf.placeholder(dtype=tf.int32, shape=[self.batch_size, self.num_steps])
    self.sequence_len = tf.placeholder(dtype=tf.int32, shape=[])
    self.targets = tf.placeholder(dtype=tf.int32, shape=[self.batch_size, self.num_steps])

class InteractivePTBModel(PTBModel):

  def __init__(self, config):
    input = PTBInteractiveInput(config)
    PTBModel.__init__(self, is_training=False, config=config, input_=input)
    output = self.logits[:, self._input.sequence_len - 1, :]
    self.top_word_id = tf.argmax(output, axis=2)

  def get_next(self, session, prefix):
    prefix_array, sequence_len = self._preprocess(prefix)
    feeds = {
      self._input.sequence_len: sequence_len,
      self._input.input_data: prefix_array,
    }
    fetches = [self.top_word_id]
    result = session.run(fetches, feeds)
    self._postprocess(result)

  def _preprocess(self, prefix):
    num_steps = self._input.num_steps
    seq_len = len(prefix)
    if seq_len > num_steps:
      raise ValueError("Prefix to large for model.")
    prefix_ids = self._prefix_to_ids(prefix)
    num_items_to_pad = num_steps - seq_len
    prefix_ids.extend([0] * num_items_to_pad)
    prefix_array = np.array([prefix_ids], dtype=np.float32)
    return prefix_array, seq_len

  def _prefix_to_ids(self, prefix):
    # should convert your prefix to a list of ids
    pass

  def _postprocess(self, result):
    # convert ids back to strings
    pass

PTBModel__init__功能中,您需要添加以下行:

In the __init__ function of PTBModel you need to add this line:

self.logits = logits

为什么要使用随机(未经初始化,未经训练的)词嵌入?

Why use a random (uninitialized, untrained) word-embedding?

首先请注意,尽管嵌入在开始时是随机的,但它们将与网络的其余部分一起训练.训练后获得的嵌入与通过word2vec模型获得的嵌入具有相似的属性,例如,具有通过向量运算(国王-男人+女人=女王等)回答类比问题的能力.诸如语言建模(不需要带注释的训练数据)或神经机器翻译之类的训练数据,更常见的是从头开始训练嵌入.

First note that, although the embeddings are random in the beginning, they will be trained with the rest of the network. The embeddings you obtain after training will have similar properties than the embeddings you obtain with word2vec models, e.g., the ability to answer analogy questions with vector operations (king - man + woman = queen, etc.) In tasks were you have a considerable amount of training data like language modelling (which does not need annotated training data) or neural machine translation, it is more common to train embeddings from scratch.

为什么要使用softmax?

Why use softmax?

Softmax是将相似性得分矢量(对数)归一化为概率分布的函数.您需要概率分布来训练具有交叉熵损失的模型,并能够从模型中进行采样.请注意,如果您仅对经过训练的模型中最可能出现的单词感兴趣,则不需要softmax,并且可以直接使用logit.

Softmax is a function that normalizes a vector of similarity scores (the logits), to a probability distribution. You need a probability distribution to train you model with cross-entropy loss and to be able to sample from the model. Note that if you are only interested in the most likely words of a trained model, you don't need the softmax and you can use the logits directly.

隐藏层是否必须匹配输入的尺寸(即word2vec嵌入的尺寸)

Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)

不,原则上它可以是任何值.但是,使用尺寸小于嵌入尺寸的隐藏状态并没有多大意义.

No, in principal it can be any value. Using a hidden state with a lower dimension than your embedding dimension, does not make much sense, however.

如何/可以引入经过预训练的word2vec模型,而不是未初始化的模型?

How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

这是一个用给定的numpy数组初始化嵌入的自包含示例.如果希望在训练过程中嵌入保持固定/不变,请将trainable设置为False.

Here is a self-contained example of initializing an embedding with a given numpy array. If you want that the embedding remains fixed/constant during training, set trainable to False.

import tensorflow as tf
import numpy as np
vocab_size = 10000
size = 200
trainable=True
embedding_matrix = np.zeros([vocab_size, size]) # replace this with code to load your pretrained embedding
embedding = tf.get_variable("embedding",
                            initializer=tf.constant_initializer(embedding_matrix),
                            shape=[vocab_size, size],
                            dtype=tf.float32,
                            trainable=trainable)

这篇关于使用LSTM教程代码预测句子中的下一个单词吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆