gensim如何计算doc2vec段落向量 [英] How does gensim calculate doc2vec paragraph vectors

查看:228
本文介绍了gensim如何计算doc2vec段落向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读这篇论文 http://cs.stanford.edu/~quocle/paragraph_vector.pdf

并指出

段落向量和词向量被平均或级联 预测上下文中的下一个单词.在实验中,我们使用 串联作为合并向量的方法."

" Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors."

串联或求平均如何工作?

How does concatenation or averaging work?

示例(如果第1段包含单词1和单词2):

example (if paragraph 1 contain word1 and word2):

word1 vector =[0.1,0.2,0.3]
word2 vector =[0.4,0.5,0.6]

concat method 
does paragraph vector = [0.1+0.4,0.2+0.5,0.3+0.6] ?

Average method 
does paragraph vector = [(0.1+0.4)/2,(0.2+0.5)/2,(0.3+0.6)/2] ?

也来自这张图片:

据说:

可以将段落标记视为另一个词.它充当 记忆当前上下文中缺少的内容的内存–或 本段的主题.因此,我们经常称这种模型 段向量的分布式存储模型(PV-DM).

The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph. For this reason, we often call this model the Distributed Memory Model of Paragraph Vectors (PV-DM).

段落标记等于等于on的段落向量吗?

Is the paragraph token equal to the paragraph vector which is equal to on?

推荐答案

串联或求平均如何工作?

How does concatenation or averaging work?

您的平均水平是正确的.串联是:[0.1,0.2,0.3,0.4,0.5,0.6].

You got it right for the average. The concatenation is: [0.1,0.2,0.3,0.4,0.5,0.6].

段落标记等于等于on的段落向量吗?

Is the paragraph token equal to the paragraph vector which is equal to on?

段落标记"被映射到一个称为段落向量"的向量.它不同于标记"on",并且不同于标记"on"映射到的单词向量.

The "paragraph token" is mapped to a vector that is called "paragraph vector". It is different from the token "on", and different from the word vector that the token "on" is mapped to.

这篇关于gensim如何计算doc2vec段落向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆