gensim如何计算doc2vec段落向量 [英] How does gensim calculate doc2vec paragraph vectors
问题描述
我正在阅读这篇论文 http://cs.stanford.edu/~quocle/paragraph_vector.pdf
并指出
段落向量和词向量被平均或级联 预测上下文中的下一个单词.在实验中,我们使用 串联作为合并向量的方法."
" Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors."
串联或求平均如何工作?
How does concatenation or averaging work?
示例(如果第1段包含单词1和单词2):
example (if paragraph 1 contain word1 and word2):
word1 vector =[0.1,0.2,0.3]
word2 vector =[0.4,0.5,0.6]
concat method
does paragraph vector = [0.1+0.4,0.2+0.5,0.3+0.6] ?
Average method
does paragraph vector = [(0.1+0.4)/2,(0.2+0.5)/2,(0.3+0.6)/2] ?
也来自这张图片:
据说:
可以将段落标记视为另一个词.它充当 记忆当前上下文中缺少的内容的内存–或 本段的主题.因此,我们经常称这种模型 段向量的分布式存储模型(PV-DM).
The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph. For this reason, we often call this model the Distributed Memory Model of Paragraph Vectors (PV-DM).
段落标记等于等于on
的段落向量吗?
Is the paragraph token equal to the paragraph vector which is equal to on
?
推荐答案
串联或求平均如何工作?
How does concatenation or averaging work?
您的平均水平是正确的.串联是:[0.1,0.2,0.3,0.4,0.5,0.6]
.
You got it right for the average. The concatenation is: [0.1,0.2,0.3,0.4,0.5,0.6]
.
段落标记等于等于on的段落向量吗?
Is the paragraph token equal to the paragraph vector which is equal to on?
段落标记"被映射到一个称为段落向量"的向量.它不同于标记"on",并且不同于标记"on"映射到的单词向量.
The "paragraph token" is mapped to a vector that is called "paragraph vector". It is different from the token "on", and different from the word vector that the token "on" is mapped to.
这篇关于gensim如何计算doc2vec段落向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!