使用LSTM ptb模型tensorflow示例预测下一个单词 [英] Predicting the next word using the LSTM ptb model tensorflow example

查看:464
本文介绍了使用LSTM ptb模型tensorflow示例预测下一个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用tensorflow LSTM模型进行下一个单词的预测.

I am trying to use the tensorflow LSTM model to make next word predictions.

如此相关问题所述(没有可接受的答案)示例包含用于提取下一个单词概率的伪代码:

As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities:

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
  # The value of state is updated after processing each batch of words.
  output, state = lstm(current_batch_of_words, state)

  # The LSTM output can be used to make next word predictions
  logits = tf.matmul(output, softmax_w) + softmax_b
  probabilities = tf.nn.softmax(logits)
  loss += loss_function(probabilities, target_words)

我对如何解释概率向量感到困惑.我在

I am confused about how to interpret the probabilities vector. I modified the __init__ function of the PTBModel in ptb_word_lm.py to store the probabilities and logits:

class PTBModel(object):
  """The PTB model."""

  def __init__(self, is_training, config):
    # General definition of LSTM (unrolled)
    # identical to tensorflow example ...     
    # omitted for brevity ...


    # computing the logits (also from example code)
    logits = tf.nn.xw_plus_b(output,
                             tf.get_variable("softmax_w", [size, vocab_size]),
                             tf.get_variable("softmax_b", [vocab_size]))
    loss = seq2seq.sequence_loss_by_example([logits],
                                            [tf.reshape(self._targets, [-1])],
                                            [tf.ones([batch_size * num_steps])],
                                            vocab_size)
    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = states[-1]

    # my addition: storing the probabilities and logits
    self.probabilities = tf.nn.softmax(logits)
    self.logits = logits

    # more model definition ...

然后在run_epoch函数中打印有关它们的一些信息:

Then printed some info about them in the run_epoch function:

def run_epoch(session, m, data, eval_op, verbose=True):
  """Runs the model on the given data."""
  # first part of function unchanged from example

  for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
                                                    m.num_steps)):
    # evaluate proobability and logit tensors too:
    cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
                                 {m.input_data: x,
                                  m.targets: y,
                                  m.initial_state: state})
    costs += cost
    iters += m.num_steps

    if verbose and step % (epoch_size // 10) == 10:
      print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
            (step * 1.0 / epoch_size, np.exp(costs / iters),
             iters * m.batch_size / (time.time() - start_time), iters))
      chosen_word = np.argmax(probs, 1)
      print("Probabilities shape: %s, Logits shape: %s" % 
            (probs.shape, logits.shape) )
      print(chosen_word)
      print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))

  return np.exp(costs / iters)

这将产生如下输出:

0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220
(20, 10000) (20, 10000)
[ 14   1   6 589   1   5   0  87   6   5   3   5   2   2   2   2   6   2  6   1]
Batch size: 1, Num steps: 20

我期望probs向量是一个概率数组,词汇表中的每个单词都有一个(例如,形状为(1, vocab_size)),这意味着我可以使用np.argmax(probs, 1)来获得预测的单词,如另一个问题.

I was expecting the probs vector to be an array of probabilities, with one for each word in the vocabulary (eg with shape (1, vocab_size)), meaning that I could get the predicted word using np.argmax(probs, 1) as suggested in the other question.

但是,向量的第一维实际上等于展开的LSTM中的步数(如果使用小的配置设置,则为20),我不确定该怎么做.要访问预测的单词,我是否只需要使用最后一个值(因为这是最后一步的输出)?还是我还缺少其他东西?

However, the first dimension of the vector is actually equal to the number of steps in the unrolled LSTM (20 if the small config settings are used), which I'm not sure what to do with. To access to the predicted word, do I just need to use the last value (because it's the output of the final step)? Or is there something else that I'm missing?

通过查看

I tried to understand how the predictions are made and evaluated by looking at the implementation of seq2seq.sequence_loss_by_example, which must perform this evaluation, but this ends up calling gen_nn_ops._sparse_softmax_cross_entropy_with_logits, which doesn't seem to be included in the github repo, so I'm not sure where else to look.

我对tensorflow和LSTM都很陌生,因此我们将不胜感激!

I'm quite new to both tensorflow and LSTMs, so any help is appreciated!

推荐答案

output张量包含每个时间步的LSTM单元输出的组合(请参阅其定义

The output tensor contains the concatentation of the LSTM cell outputs for each timestep (see its definition here). Therefore you can find the prediction for the next word by taking chosen_word[-1] (or chosen_word[sequence_length - 1] if the sequence has been padded to match the unrolled LSTM).

tf.nn.sparse_softmax_cross_entropy_with_logits() 操作是以不同的名称记录在公共API中.由于技术原因,它会调用生成的包装器函数,该函数不会出现在GitHub存储库中. op的实现在C ++中,此处.

这篇关于使用LSTM ptb模型tensorflow示例预测下一个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆