使用LSTM ptb模型tensorflow示例预测下一个单词 [英] Predicting the next word using the LSTM ptb model tensorflow example

查看：464 发布时间：2020/5/4 6:19:50 python tensorflow lstm

本文介绍了使用LSTM ptb模型tensorflow示例预测下一个单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用tensorflow LSTM模型进行下一个单词的预测.

I am trying to use the tensorflow LSTM model to make next word predictions.

如此相关问题所述(没有可接受的答案)示例包含用于提取下一个单词概率的伪代码:

As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities:

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
  # The value of state is updated after processing each batch of words.
  output, state = lstm(current_batch_of_words, state)

  # The LSTM output can be used to make next word predictions
  logits = tf.matmul(output, softmax_w) + softmax_b
  probabilities = tf.nn.softmax(logits)
  loss += loss_function(probabilities, target_words)

我对如何解释概率向量感到困惑.我在

I am confused about how to interpret the probabilities vector. I modified the __init__ function of the PTBModel in ptb_word_lm.py to store the probabilities and logits:

class PTBModel(object):
  """The PTB model."""

  def __init__(self, is_training, config):
    # General definition of LSTM (unrolled)
    # identical to tensorflow example ...     
    # omitted for brevity ...


    # computing the logits (also from example code)
    logits = tf.nn.xw_plus_b(output,
                             tf.get_variable("softmax_w", [size, vocab_size]),
                             tf.get_variable("softmax_b", [vocab_size]))
    loss = seq2seq.sequence_loss_by_example([logits],
                                            [tf.reshape(self._targets, [-1])],
                                            [tf.ones([batch_size * num_steps])],
                                            vocab_size)
    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = states[-1]

    # my addition: storing the probabilities and logits
    self.probabilities = tf.nn.softmax(logits)
    self.logits = logits

    # more model definition ...

然后在run_epoch函数中打印有关它们的一些信息:

Then printed some info about them in the run_epoch function:

def run_epoch(session, m, data, eval_op, verbose=True):
  """Runs the model on the given data."""
  # first part of function unchanged from example

  for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
                                                    m.num_steps)):
    # evaluate proobability and logit tensors too:
    cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
                                 {m.input_data: x,
                                  m.targets: y,
                                  m.initial_state: state})
    costs += cost
    iters += m.num_steps

    if verbose and step % (epoch_size // 10) == 10:
      print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
            (step * 1.0 / epoch_size, np.exp(costs / iters),
             iters * m.batch_size / (time.time() - start_time), iters))
      chosen_word = np.argmax(probs, 1)
      print("Probabilities shape: %s, Logits shape: %s" % 
            (probs.shape, logits.shape) )
      print(chosen_word)
      print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))

  return np.exp(costs / iters)

这将产生如下输出:

0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220
(20, 10000) (20, 10000)
[ 14   1   6 589   1   5   0  87   6   5   3   5   2   2   2   2   6   2  6   1]
Batch size: 1, Num steps: 20

我期望probs向量是一个概率数组，词汇表中的每个单词都有一个(例如，形状为(1, vocab_size))，这意味着我可以使用np.argmax(probs, 1)来获得预测的单词，如另一个问题.

I was expecting the probs vector to be an array of probabilities, with one for each word in the vocabulary (eg with shape (1, vocab_size)), meaning that I could get the predicted word using np.argmax(probs, 1) as suggested in the other question.

但是，向量的第一维实际上等于展开的LSTM中的步数(如果使用小的配置设置，则为20)，我不确定该怎么做.要访问预测的单词，我是否只需要使用最后一个值(因为这是最后一步的输出)?还是我还缺少其他东西?

However, the first dimension of the vector is actually equal to the number of steps in the unrolled LSTM (20 if the small config settings are used), which I'm not sure what to do with. To access to the predicted word, do I just need to use the last value (because it's the output of the final step)? Or is there something else that I'm missing?

通过查看

I tried to understand how the predictions are made and evaluated by looking at the implementation of seq2seq.sequence_loss_by_example, which must perform this evaluation, but this ends up calling gen_nn_ops._sparse_softmax_cross_entropy_with_logits, which doesn't seem to be included in the github repo, so I'm not sure where else to look.

我对tensorflow和LSTM都很陌生，因此我们将不胜感激！

I'm quite new to both tensorflow and LSTMs, so any help is appreciated!

使用LSTM ptb模型tensorflow示例预测下一个单词 [英] Predicting the next word using the LSTM ptb model tensorflow example

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用LSTM ptb模型tensorflow示例预测下一个单词 [英] Predicting the next word using the LSTM ptb model tensorflow example

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭