Gensim中doc2vec标记中的问题 [英] Issues in doc2vec tags in Gensim
本文介绍了Gensim中doc2vec标记中的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用gensim doc2vec,如下所示.
I am using gensim doc2vec as below.
from gensim.models import doc2vec
from collections import namedtuple
import re
my_d = {'recipe__001__1': 'recipe 1 details should come here',
'recipe__001__2': 'Ingredients of recipe 2 need to be added'}
docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')
for key, value in my_d.items():
value = re.sub("[^a-zA-Z]"," ", value)
words = value.lower().split()
tags = key
docs.append(analyzedDocument(words, tags))
model = doc2vec.Doc2Vec(docs, size = 300, window = 10, dm=1, negative=5, hs=0, min_count = 1, workers = 4, iter = 20)
但是,当我检查model.docvecs.offset2doctag
时,将得到['r', 'e', 'c', 'i', 'p', '_', '0', '1', '2']
作为输出.实际输出应为"recipe__001__1"和"recipe__001__2".
However, when I check model.docvecs.offset2doctag
I get ['r', 'e', 'c', 'i', 'p', '_', '0', '1', '2']
as the output. The real output should be `'recipe__001__1' and 'recipe__001__2'.
当我使用len(model.docvecs.doctag_syn0)
时,我得到9
作为输出.但是实际值应该为2
,因为我的测试字典中只有2个配方.
When I use len(model.docvecs.doctag_syn0)
I get 9
as the output. But the real value should be 2
because I only have 2 recipes in my test dictionary.
请让我知道,为什么会这样?
Please let me know, why this happens?
推荐答案
尝试更改此行:
tags = key
到
tags = [key]
这篇关于Gensim中doc2vec标记中的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文