gensim word2vec-在线词嵌入更新中的数组维数 [英] gensim word2vec - array dimensions in updating with online word embedding

查看:144
本文介绍了gensim word2vec-在线词嵌入更新中的数组维数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

gensim 0.13.4.1中的Word2Vec无法即时更新单词向量.

Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work.

model.build_vocab(sentences, update=False)

工作正常;但是,

model.build_vocab(sentences, update=True)

没有.

我正在使用此网站尝试模仿他们所做的事情;因此,我有时会使用以下脚本:

I am using this website to try and emulate what they have done; hence I use the following script at some point:

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("./text8/text8")
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False)
model.train(sentences)

然而,尽管这与update=False一起运行,但使用update=True却可以得到以下回溯:

However while this runs with update=False, using update=True gives me the following traceback:

Traceback (most recent call last):
  File "word2vecAttempt.py", line 34, in <module>
    model.build_vocab(sentences, progress_per=10000, update=True)
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab
    self.finalize_vocab(update=update)  # build tables & arrays
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab
    self.update_weights()
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights
    self.wv.syn0 = vstack([self.wv.syn0, newsyn0])
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

推荐答案

我能够重现您的错误.我认为您尚未训练模型时正在调用update=True.您只应在对其进行预培训后再调用它.

I was able to reproduce your error. I think you're calling update=True when the model is not trained yet. You should only call it when it has been pre-trained.

这有效:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=False)
model.train(sentences)

model.build_vocab(sentences, update=True)
model.train(sentences)

但这将失败:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=True)
model.train(sentences)

ValueError: all the input array dimensions except for the concatenation axis must match exactly

使用最新版本的gensim 0.13.4.1.

Using the latest version of gensim 0.13.4.1.

这篇关于gensim word2vec-在线词嵌入更新中的数组维数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆