在doc2vec DBOW中如何将单词向量与段落向量共同训练? [英] How are word vectors co-trained with paragraph vectors in doc2vec DBOW?
问题描述
我完全不了解gensim的 doc2vec (在DBOW模式(dm=0
)中).我知道默认情况下它已被dbow_words=0
禁用.但是,当我们将dbow_words
设置为1时会发生什么?
在我对DBOW的理解中,上下文词是直接从段落向量中预测的.因此,模型的唯一参数是N
p
维段落矢量以及分类器的参数.
但是,有多个消息来源暗示,在DBOW模式下可以共同训练word和doc向量.例如:
那么,这是怎么做的呢? 任何澄清将不胜感激!
注:对于DM,将段落向量与单词向量进行平均/连接以预测目标单词.在这种情况下,很明显单词向量与文档向量是同时训练的.并且有N*p + M*q + classifier
个参数(其中M
是vocab的大小,而q
词向量空间是暗的).
如果设置了dbow_words=1
,则将跳过gram单词向量训练添加到训练循环中,与常规PV-DBOW训练交错.
然后,文本中的下一个目标单词将获得相同的PV-DBOW加上跳跃语法处理,依此类推,依此类推.
由此带来的一些逻辑后果:
-
比普通的PV-DBOW花费的时间更长-大约等于
window
参数的因数 -
单词向量总体上比doc-vector获得更多的总体培训关注,再次达到等于
window
参数
I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0
). I know that it's disabled by default with dbow_words=0
. But what happens when we set dbow_words
to 1?
In my understanding of DBOW, the context words are predicted directly from the paragraph vectors. So the only parameters of the model are the N
p
-dimensional paragraph vectors plus the parameters of the classifier.
But multiple sources hint that it is possible in DBOW mode to co-train word and doc vectors. For instance:
- section 5 of An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- this SO answer: How to use Gensim doc2vec with pre-trained word vectors?
So, how is this done? Any clarification would be much appreciated!
Note: for DM, the paragraph vectors are averaged/concatenated with the word vectors to predict the target words. In that case, it's clear that words vectors are trained simultaneously with document vectors. And there are N*p + M*q + classifier
parameters (where M
is vocab size and q
word vector space dim).
If you set dbow_words=1
, then skip-gram word-vector training is added the to training loop, interleaved with the normal PV-DBOW training.
So, for a given target word in a text, 1st the candidate doc-vector is used (alone) to try to predict that word, with backpropagation adjustments then occurring to the model & doc-vector. Then, a bunch of the surrounding words are each used, one at a time in skip-gram fashion, to try to predict that same target word – with the followup adjustments made.
Then, the next target word in the text gets the same PV-DBOW plus skip-gram treatment, and so on, and so on.
As some logical consequences of this:
training takes longer than plain PV-DBOW - by about a factor equal to the
window
parameterword-vectors overall wind up getting more total training attention than doc-vectors, again by a factor equal to the
window
parameter
这篇关于在doc2vec DBOW中如何将单词向量与段落向量共同训练?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!