在python上使用gensim Word2Vec的不同模型 [英] Different models with gensim Word2Vec on python

查看：70 发布时间：2021/5/10 19:07:10 python nlp gensim word2vec

本文介绍了在python上使用gensim Word2Vec的不同模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试应用在python gensim库中实现的word2vec模型.我有一个句子列表(每个句子是一个单词列表).

I am trying to apply the word2vec model implemented in the library gensim in python. I have a list of sentences (each sentences is a list of words).

例如，让我们有:

sentences=[['first','second','third','fourth']]*n

我实现了两个相同的模型:

and I implement two identical models:

model = gensim.models.Word2Vec(sententes, min_count=1,size=2)
model2=gensim.models.Word2Vec(sentences, min_count=1,size=2)

我意识到，取决于n的值，模型有时是相同的，有时是不同的.

I realize that the models sometimes are the same, and sometimes are different, depending on the value of n.

例如，如果n = 100，我得到

For instance, if n=100 I obtain

print(model['first']==model2['first'])
True

同时，对于n = 1000:

while, for n=1000:

print(model['first']==model2['first'])
False

怎么可能?

非常感谢！

推荐答案

查看 gensim 文档，运行 Word2Vec 时会有一些随机性:

Looking at the gensim documentation, there is some randomization when you run Word2Vec:

seed =用于随机数生成器.每个单词的初始向量都以单词+ str(seed)的串联哈希值作为种子.请注意，对于完全确定性可重现的运行，还必须将模型限制为单个工作线程，以消除OS线程调度中的排序抖动.

seed = for the random number generator. Initial vectors for each word are seeded with a hash of the concatenation of word + str(seed). Note that for a fully deterministically-reproducible run, you must also limit the model to a single worker thread, to eliminate ordering jitter from OS thread scheduling.

因此，如果要获得可重复的结果，则需要设置种子:

Thus if you want to have reproducible results, you will need to set the seed:

In [1]: import gensim

In [2]: sentences=[['first','second','third','fourth']]*1000

In [3]: model1 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)

In [4]: model2 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)

In [5]: print(all(model1['first']==model2['first']))
False

In [6]: model3 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)

In [7]: model4 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)

In [11]: print(all(model3['first']==model4['first']))
True

这篇关于在python上使用gensim Word2Vec的不同模型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在python上使用gensim Word2Vec的不同模型 [英] Different models with gensim Word2Vec on python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python上使用gensim Word2Vec的不同模型 [英] Different models with gensim Word2Vec on python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭