为什么要在gensim word2vec中创建多个模型文件? [英] Why are multiple model files created in gensim word2vec?
问题描述
当我尝试创建word2vec模型(带有负采样的跳过图)时,我收到了3个文件,如下所示.
When I try to create a word2vec model (skipgram with negative sampling) I received 3 files as output as follows.
word2vec (File)
word2vec.syn1nef.npy (NPY file)
word2vec.wv.syn0.npy (NPY file)
我只是担心为什么会发生这种情况,就像我以前在word2vec中的测试示例一样,我只收到一个模型(没有npy文件).
I am just worried why this happens as for my previous test examples in word2vec I only received one model(no npy files).
请帮助我.
推荐答案
具有较大内部矢量数组的模型无法通过Python'pickle'保存到单个文件,因此超过一定阈值的gensim save()
方法将使用更有效的numpy数组原始格式(.npy
格式)将辅助数组存储在单独的文件中.
Models with larger internal vector-arrays can't be saved via Python 'pickle' to a single file, so beyond a certain threshold, the gensim save()
method will store subsidiary arrays in separate files, using the more-efficient raw format of numpy arrays (.npy
format).
您仍然通过指定根模型文件名来load()
模型;当需要辅助数组时,加载代码将找到辅助文件,只要它们保留在根文件旁边即可.因此,在将模型移到其他位置时,请确保将所有具有相同根文件名的文件放在一起.
You still load()
the model by just specifying the root model filename; when the subsidiary arrays are needed, the loading code will find the side files – as long as they're kept beside the root file. So when moving a model elsewhere, be sure to keep all files with the same root filename together.
这篇关于为什么要在gensim word2vec中创建多个模型文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!