计算word2vec模型的困惑度 [英] Calculate perplexity of word2vec model
问题描述
我用50万个句子(约6万个)训练了Gensim W2V模型,我想计算困惑度.
I trained Gensim W2V model on 500K sentences (around 60K) words and I want to calculate the perplexity.
- 这样做的最好方法是什么?
- 对于6万个单词,我该如何检查适当的数据量?
谢谢
推荐答案
如果要计算困惑度,则必须首先获取损失.
在gensim.models.word2vec.Word2Vec
构造函数上,传递compute_loss=True
参数-这样,gensim
将为您存储训练时的损失.
培训后,您可以调用 get_latest_training_loss()
提取损失的方法.
If you want to calculate the perplexity, you have first to retrieve the loss.
On the gensim.models.word2vec.Word2Vec
constructor, pass the compute_loss=True
parameter - this way, gensim
will store the loss for you while training.
Once trained, you can call the get_latest_training_loss()
method to retrieve the loss.
由于skip-gram模型的交叉熵损失中的损失,损失的幂为2将给您带来困惑. (2 **损失)
Since the loss in the cross-entropy loss of the skip-gram model, 2 to the power of the loss will give you the preplexity. (2**loss)
这篇关于计算word2vec模型的困惑度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!