在doc2vec DBOW中如何将单词向量与段落向量共同训练? [英] How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

查看:152
本文介绍了在doc2vec DBOW中如何将单词向量与段落向量共同训练?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我完全不了解gensim的 doc2vec (在DBOW模式(dm=0)中).我知道默认情况下它已被dbow_words=0禁用.但是,当我们将dbow_words设置为1时会发生什么?

在我对DBOW的理解中,上下文词是直接从段落向量中预测的.因此,模型的唯一参数是N p维段落矢量以及分类器的参数.

但是,有多个消息来源暗示,在DBOW模式下可以共同训练word和doc向量.例如:

那么,这是怎么做的呢? 任何澄清将不胜感激!

注:对于DM,将段落向量与单词向量进行平均/连接以预测目标单词.在这种情况下,很明显单词向量与文档向量是同时训练的.并且有N*p + M*q + classifier个参数(其中M是vocab的大小,而q词向量空间是暗的).

解决方案

如果设置了dbow_words=1,则将跳过gram单词向量训练添加到训练循环中,与常规PV-DBOW训练交错.

因此,对于文本中的给定目标词,首先使用候选文档向量(单独)来尝试预测该词,然后对模型&进行反向传播调整. doc-vector.然后,使用一组周围的单词,以跳跃语法的方式一次使用一个单词,以尝试预测相同的目标单词-并进行后续调整.

然后,文本中的下一个目标单词将获得相同的PV-DBOW加上跳跃语法处理,依此类推,依此类推.

由此带来的一些逻辑后果:

  • 比普通的PV-DBOW花费的时间更长-大约等于window参数的因数

  • 单词向量总体上比doc-vector获得更多的总体培训关注,再次达到等于window参数

I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0). I know that it's disabled by default with dbow_words=0. But what happens when we set dbow_words to 1?

In my understanding of DBOW, the context words are predicted directly from the paragraph vectors. So the only parameters of the model are the N p-dimensional paragraph vectors plus the parameters of the classifier.

But multiple sources hint that it is possible in DBOW mode to co-train word and doc vectors. For instance:

So, how is this done? Any clarification would be much appreciated!

Note: for DM, the paragraph vectors are averaged/concatenated with the word vectors to predict the target words. In that case, it's clear that words vectors are trained simultaneously with document vectors. And there are N*p + M*q + classifier parameters (where M is vocab size and q word vector space dim).

解决方案

If you set dbow_words=1, then skip-gram word-vector training is added the to training loop, interleaved with the normal PV-DBOW training.

So, for a given target word in a text, 1st the candidate doc-vector is used (alone) to try to predict that word, with backpropagation adjustments then occurring to the model & doc-vector. Then, a bunch of the surrounding words are each used, one at a time in skip-gram fashion, to try to predict that same target word – with the followup adjustments made.

Then, the next target word in the text gets the same PV-DBOW plus skip-gram treatment, and so on, and so on.

As some logical consequences of this:

  • training takes longer than plain PV-DBOW - by about a factor equal to the window parameter

  • word-vectors overall wind up getting more total training attention than doc-vectors, again by a factor equal to the window parameter

这篇关于在doc2vec DBOW中如何将单词向量与段落向量共同训练?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆