如何计算张量流中RNN的困惑度 [英] How to calculate perplexity of RNN in tensorflow

查看:165
本文介绍了如何计算张量流中RNN的困惑度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行单词RNN

如何计算RNN的困惑度.

How to calculate the perplexity of RNN.

以下是训练中的代码,它显示每个时期的训练损失和其他情况:

Following is the code in training that shows training loss and other things in each epoch:

for e in range(model.epoch_pointer.eval(), args.num_epochs):
        sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))
        data_loader.reset_batch_pointer()
        state = sess.run(model.initial_state)
        speed = 0
        if args.init_from is None:
            assign_op = model.batch_pointer.assign(0)
            sess.run(assign_op)
            assign_op = model.epoch_pointer.assign(e)
            sess.run(assign_op)
        if args.init_from is not None:
            data_loader.pointer = model.batch_pointer.eval()
            args.init_from = None
        for b in range(data_loader.pointer, data_loader.num_batches):
            start = time.time()
            x, y = data_loader.next_batch()
            feed = {model.input_data: x, model.targets: y, model.initial_state: state,
                    model.batch_time: speed}
            summary, train_loss, state, _, _ = sess.run([merged, model.cost, model.final_state,
                                                         model.train_op, model.inc_batch_pointer_op], feed)
            train_writer.add_summary(summary, e * data_loader.num_batches + b)
            speed = time.time() - start
            if (e * data_loader.num_batches + b) % args.batch_size == 0:
                print("{}/{} (epoch {}), train_loss = {:.3f}, time/batch = {:.3f}" \
                    .format(e * data_loader.num_batches + b,
                            args.num_epochs * data_loader.num_batches,
                            e, train_loss, speed))
            if (e * data_loader.num_batches + b) % args.save_every == 0 \
                    or (e==args.num_epochs-1 and b == data_loader.num_batches-1): # save for the last result
                checkpoint_path = os.path.join(args.save_dir, 'model.ckpt')
                saver.save(sess, checkpoint_path, global_step = e * data_loader.num_batches + b)
                print("model saved to {}".format(checkpoint_path))
    train_writer.close()

推荐答案

您所引用的项目使用sequence_to_sequence_loss_by_example,它会返回交叉熵损失.因此,要计算训练的困惑度,您只需要像此处中所述对损失进行幂运算即可.

The project you are referencing uses sequence_to_sequence_loss_by_example, which returns the cross-entropy loss. So for calculating the training perplexity, you just need to exponentiate the loss like explained here.

train_perplexity = tf.exp(train_loss)

我们必须使用e而不是2作为基数,因为TensorFlow用自然对数(

We have to use e instead of 2 as a base, because TensorFlow measures the cross-entropy loss with the natural logarithm (TF Documentation). Thank you, @Matthias Arro and @Colin Skow for the hint.

当我们基于Q制定编码方案时,两个概率分布P和Q的交叉熵告诉我们编码P事件所需的最小平均位数.因此,P是真实分布,通常不知道.我们希望找到一个尽可能接近P的Q,以便我们开发出一种不错的编码方案,使每个事件的位数尽可能少.

The cross-entropy of two probability distributions P and Q tells us the minimum average number of bits we need to encode events of P, when we develop a coding scheme based on Q. So, P is the true distribution, which we usually don't know. We want to find a Q as close to P as possible, so that we can develop a nice coding scheme with as few bits per event as possible.

我不应该说位,因为如果在交叉熵的计算中使用2为底,我们只能使用位作为度量.但是TensorFlow使用自然对数,因此让我们用 nats 来衡量交叉熵.

I shouldn't say bits, because we can only use bits as a measure if we use base 2 in the calculation of the cross-entropy. But TensorFlow uses the natural logarithm, so instead let's measure the cross-entropy in nats.

因此,假设我们有一个错误的语言模型,该模型说明词汇表中的每个标记(字符/单词)都很可能成为下一个标记.对于1000个令牌的词汇表,该模型将具有 log(1000)= 6.9 nats 的交叉熵.预测下一个令牌时,必须在每个步骤中在1000个令牌之间进行统一选择.

So let's say we have a bad language model that says every token (character / word) in the vocabulary is equally probable to be the next one. For a vocabulary of 1000 tokens, this model will have a cross-entropy of log(1000) = 6.9 nats. When predicting the next token, it has to choose uniformly between 1000 tokens at each step.

更好的语言模型将确定更接近P的概率分布Q.因此,交叉熵较低-我们可能会得到3.9个nats的交叉熵.如果现在要测量困惑度,我们只需对交叉熵求幂:

A better language model will determine a probability distribution Q that is closer to P. Thus, the cross-entropy is lower - we might get a cross-entropy of 3.9 nats. If we now want to measure the perplexity, we simply exponentiate the cross-entropy:

exp(3.9)= 49.4

因此,在我们为之计算损失的样本上,好的模型是如此复杂,就好像它必须在大约50个令牌中统一选择一样.

So, on the samples, for which we calculated the loss, the good model was as perplex as if it had to choose uniformly and independently among roughly 50 tokens.

这篇关于如何计算张量流中RNN的困惑度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆