如何解释以前查看过的数据上的doc2vec结果? [英] How to interpret doc2vec results on previously seen data?

查看：104 发布时间：2021/5/10 19:07:33 gensim doc2vec

本文介绍了如何解释以前查看过的数据上的doc2vec结果?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用gensim 4.0.1并训练doc2vec:

I use gensim 4.0.1 and train doc2vec:

from gensim.test.utils import common_texts
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
    
sentences = [['hello', 'world'], ['james', 'bond'], ['adam', 'smith']]
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(sentences)]
model = Doc2Vec(documents, vector_size=5, window=5, min_count=0, workers=4)

documents
    [TaggedDocument(words=['hello', 'world'], tags=[0]),
    TaggedDocument(words=['james', 'bond'], tags=[1]),
    TaggedDocument(words=['adam', 'smith'], tags=[2])]

model.dv[0],model.dv[1],model.dv[2]
        (array([-0.10461631, -0.11958256, -0.1976151 ,  0.1710569 ,  0.0713223 ],
               dtype=float32),
         array([ 0.00526548, -0.19761242, -0.10334401, -0.19437183,  0.04021204],
               dtype=float32),
         array([ 0.05662392,  0.09290017, -0.08597242, -0.06293383, -0.06159503],
               dtype=float32))

我希望与TaggedDocument#1匹配

I expect to get a match on TaggedDocument #1

seen = ['james','bond']

令人惊讶地，已知文本(詹姆斯·邦德)产生了完全看不见的"文本.向量:

Surprisingly, that known text (james bond) produces a completely "unseen" vector:

new_vector = model.infer_vector(seen)
new_vector
        
        array([-0.07762126,  0.03976333, -0.02985927,  0.07899596, -0.03556045],
              dtype=float32)

most_similar()没有指向预期的Tag = 1.而且，所有3个得分都非常低，表明数据完全看不见.

The most_similar() does not point to the expected Tag=1. Moreover, all 3 scores are quite weak implying completely unseen data.

model.dv.most_similar_cosmul(positive=[new_vector]) 
[(0, 0.5322251915931702), (2, 0.4972134530544281), (1, 0.46321794390678406)]

这里有什么问题，有什么主意吗?

What is wrong here, any ideas?

如何解释以前查看过的数据上的doc2vec结果? [英] How to interpret doc2vec results on previously seen data?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何解释以前查看过的数据上的doc2vec结果? [英] How to interpret doc2vec results on previously seen data?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭