在doc2vec DBOW中如何将单词向量与段落向量共同训练? [英] How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

查看：152 发布时间：2020/11/13 6:20:32 gensim word2vec doc2vec

本文介绍了在doc2vec DBOW中如何将单词向量与段落向量共同训练?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我完全不了解gensim的 doc2vec (在DBOW模式(dm=0)中).我知道默认情况下它已被dbow_words=0禁用.但是，当我们将dbow_words设置为1时会发生什么?

在我对DBOW的理解中，上下文词是直接从段落向量中预测的.因此，模型的唯一参数是N p维段落矢量以及分类器的参数.

但是，有多个消息来源暗示，在DBOW模式下可以共同训练word和doc向量.例如:

对doc2vec进行实证评估，对文档嵌入生成具有实用见识

这样的答案:如何将Gensim doc2vec与预训练的单词向量一起使用?

那么，这是怎么做的呢? 任何澄清将不胜感激！

注:对于DM，将段落向量与单词向量进行平均/连接以预测目标单词.在这种情况下，很明显单词向量与文档向量是同时训练的.并且有N*p + M*q + classifier个参数(其中M是vocab的大小，而q词向量空间是暗的).

解决方案

如果设置了dbow_words=1，则将跳过gram单词向量训练添加到训练循环中，与常规PV-DBOW训练交错.

因此，对于文本中的给定目标词，首先使用候选文档向量(单独)来尝试预测该词，然后对模型&进行反向传播调整. doc-vector.然后，使用一组周围的单词，以跳跃语法的方式一次使用一个单词，以尝试预测相同的目标单词-并进行后续调整.

然后，文本中的下一个目标单词将获得相同的PV-DBOW加上跳跃语法处理，依此类推，依此类推.

由此带来的一些逻辑后果:

比普通的PV-DBOW花费的时间更长-大约等于window参数的因数
单词向量总体上比doc-vector获得更多的总体培训关注，再次达到等于window参数

I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0). I know that it's disabled by default with dbow_words=0. But what happens when we set dbow_words to 1?

In my understanding of DBOW, the context words are predicted directly from the paragraph vectors. So the only parameters of the model are the N p-dimensional paragraph vectors plus the parameters of the classifier.

But multiple sources hint that it is possible in DBOW mode to co-train word and doc vectors. For instance:

section 5 of An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
this SO answer: How to use Gensim doc2vec with pre-trained word vectors?

So, how is this done? Any clarification would be much appreciated!

Note: for DM, the paragraph vectors are averaged/concatenated with the word vectors to predict the target words. In that case, it's clear that words vectors are trained simultaneously with document vectors. And there are N*p + M*q + classifier parameters (where M is vocab size and q word vector space dim).

解决方案

If you set dbow_words=1, then skip-gram word-vector training is added the to training loop, interleaved with the normal PV-DBOW training.

So, for a given target word in a text, 1st the candidate doc-vector is used (alone) to try to predict that word, with backpropagation adjustments then occurring to the model & doc-vector. Then, a bunch of the surrounding words are each used, one at a time in skip-gram fashion, to try to predict that same target word – with the followup adjustments made.

Then, the next target word in the text gets the same PV-DBOW plus skip-gram treatment, and so on, and so on.

As some logical consequences of this:

training takes longer than plain PV-DBOW - by about a factor equal to the window parameter
word-vectors overall wind up getting more total training attention than doc-vectors, again by a factor equal to the window parameter

这篇关于在doc2vec DBOW中如何将单词向量与段落向量共同训练?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在doc2vec DBOW中如何将单词向量与段落向量共同训练? [英] How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在doc2vec DBOW中如何将单词向量与段落向量共同训练? [英] How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭