更新gensim word2vec模型 [英] Update gensim word2vec model

查看:251
本文介绍了更新gensim word2vec模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在gensim中有一个word2vec模型,受过98892个文档的训练.对于句子数组中不存在的任何给定句子(即我训练模型所依据的集合),我需要使用该句子更新模型,以便下次查询时给出一些结果.我这样做是这样的:

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the model with that sentence so that querying it next time gives out some results. I am doing it like this:

new_sentence = ['moscow', 'weather', 'cold']
model.train(new_sentence)

并将其打印为日志:

2014-03-01 16:46:58,061 : INFO : training model with 1 workers on 98892 vocabulary and 100 features
2014-03-01 16:46:58,211 : INFO : reached the end of input; waiting to finish 1 outstanding jobs
2014-03-01 16:46:58,235 : INFO : training on 10 words took 0.1s, 174 words/s

现在,当我使用类似new_sentence的查询来查询大多数肯定词时(如model.most_similar(positive=new_sentence)),它将发出错误消息:

Now, when I query with similar new_sentence for most positives (as model.most_similar(positive=new_sentence)) it gives out error:

Traceback (most recent call last):
 File "<pyshell#220>", line 1, in <module>
 model.most_similar(positive=['moscow', 'weather', 'cold'])
 File "/Library/Python/2.7/site-packages/gensim/models/word2vec.py", line 405, in most_similar
 raise KeyError("word '%s' not in vocabulary" % word)
  KeyError: "word 'cold' not in vocabulary"

哪个表示冷"这个词不是我训练该词汇所用的词汇的一部分(我是对的)?

Which indicates that the word 'cold' is not part of the vocabulary over which i trained the thing (am I right)?

问题是:如何更新模型,以便给出给定新句子的所有可能相似性?

So the question is: How to update the model so that it gives out all the possible similarities for the given new sentence?

推荐答案

  1. train()期望输入的句子为序列,而不是一个句子.

  1. train() expects a sequence of sentences on input, not one sentence.

train() 更新现有权重基于现有词汇的特征向量.您不能使用train()添加新词汇(=新特征向量).

train() only updates weights for existing feature vectors based on existing vocabulary. You cannot add new vocabulary (=new feature vectors) using train().

这篇关于更新gensim word2vec模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆