使用语言模型 tensorflow 示例预测下一个单词 [英] Predicting next word using the language model tensorflow example

查看:29
本文介绍了使用语言模型 tensorflow 示例预测下一个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于语言模型的 tensorflow 教程允许计算句子的概率:

The tensorflow tutorial on language model allows to compute the probability of sentences :

probabilities = tf.nn.softmax(logits)

在下面的评论中,它还指定了一种预测下一个单词的方法而不是概率,但没有指定如何做到这一点.那么如何使用这个例子输出一个单词而不是概率呢?

in the comments below it also specifies a way of predicting the next word instead of probabilities but does not specify how this can be done. So how to output a word instead of probability using this example?

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

推荐答案

您的输出是一个 TensorFlow 列表,并且可以使用 TensorFlow 函数获取其最大参数(预测的最可能的类).这通常是包含下一个单词概率的列表.

Your output is a TensorFlow list and it is possible to get its max argument (the predicted most probable class) with a TensorFlow function. This is normally the list that contains the next word's probabilities.

在此页面,您的输出列表在以下示例中为 y:

At "Evaluate the Model" from this page, your output list is y in the following example:

首先,我们将找出我们预测正确标签的位置.tf.argmax是一个非常有用的函数,它为您提供沿某个轴的张量中的最高条目.例如,tf.argmax(y,1)是我们的模型认为每个输入最有可能的标签,而tf.argmax(y_,1) 是真正的标签.我们可以使用 tf.equal 来检查我们的预测与事实相符.correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

First we'll figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the true label. We can use tf.equal to check if our prediction matches the truth. correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

另一种不同的方法是使用预先矢量化(嵌入/编码)的单词.你可以用 Word2vec 向量化你的单词(因此嵌入它们)以加速学习,你可能想看看这个.每个单词都可以表示为 300 维空间中的一个点,并且您可以在网络的输出中自动找到最接近空间中预测点的N 个单词".在这种情况下, argmax 继续进行的方式不再起作用,您可能可以将余弦相似度与您真正想要比较的单词进行比较,但为此我不确定这实际上如何造成数值不稳定.在这种情况下,y 不会将单词表示为特征,而是根据不同的模型将单词嵌入到 100 到 2000 的维度上.你可以谷歌这样的东西以获取更多信息:男人女人女王词加法word2vec"以更多地了解嵌入的主题.

Another approach that is different is to have pre-vectorized (embedded/encoded) words. You could vectorize your words (therefore embed them) with Word2vec to accelerate learning, you might want to take a look at this. Each word could be represented as a point in a 300 dimensions space of meaning, and you could find automatically the "N words" closest to the predicted point in space at the output of the network. In that case, the argmax way to proceed does not work anymore and you could probably compare on cosine similarity with the words you truly wanted to compare to, but for that I am not sure actually how does this could cause numerical instabilities. In that case y will not represent words as features, but word embeddings over a dimensionality of, let's say, 100 to 2000 in size according to different models. You could Google something like this for more info: "man woman queen word addition word2vec" to understand the subject of embeddings more.

注意:当我在这里谈论 word2vec 时,它是关于使用外部预训练的 word2vec 模型来帮助您的训练只有预嵌入输入并创建嵌入输出.这些输出的对应词可以通过 word2vec 重新计算,以找到对应的相似的顶部预测词.

Note: when I talk about word2vec here, it is about using an external pre-trained word2vec model to help your training to only have pre-embedded inputs and create embedding outputs. Those outputs' corresponding words can be re-figured out by word2vec to find the corresponding similar top predicted words.

请注意,我建议的方法并不准确,因为只有知道我们是否准确地预测了我们想要预测的单词才有用.对于更软的方法,如果您使用句子或比单词长的东西,可以使用 ROUGE 或 BLEU 指标来评估您的模型.

Notice that the approach I suggest is not exact since it would be only useful to know if we predict EXACTLY the word that we wanted to predict. For a more soft approach, it would be possible to use ROUGE or BLEU metrics for evaluating your model in case you use sentences or something longer than a word.

这篇关于使用语言模型 tensorflow 示例预测下一个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆