为什么要在gensim word2vec中创建多个模型文件? [英] Why are multiple model files created in gensim word2vec?

查看:62
本文介绍了为什么要在gensim word2vec中创建多个模型文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试创建word2vec模型(带有负采样的跳过图)时,我收到了3个文件,如下所示.

When I try to create a word2vec model (skipgram with negative sampling) I received 3 files as output as follows.

word2vec (File)
word2vec.syn1nef.npy (NPY file)
word2vec.wv.syn0.npy (NPY file)

我只是担心为什么会发生这种情况,就像我以前在word2vec中的测试示例一样,我只收到一个模型(没有npy文件).

I am just worried why this happens as for my previous test examples in word2vec I only received one model(no npy files).

请帮助我.

推荐答案

具有较大内部矢量数组的模型无法通过Python'pickle'保存到单个文件,因此超过一定阈值的gensim save()方法将使用更有效的numpy数组原始格式(.npy格式)将辅助数组存储在单独的文件中.

Models with larger internal vector-arrays can't be saved via Python 'pickle' to a single file, so beyond a certain threshold, the gensim save() method will store subsidiary arrays in separate files, using the more-efficient raw format of numpy arrays (.npy format).

您仍然通过指定根模型文件名来load()模型;当需要辅助数组时,加载代码将找到辅助文件,只要它们保留在根文件旁边即可.因此,在将模型移到其他位置时,请确保将所有具有相同根文件名的文件放在一起.

You still load() the model by just specifying the root model filename; when the subsidiary arrays are needed, the loading code will find the side files – as long as they're kept beside the root file. So when moving a model elsewhere, be sure to keep all files with the same root filename together.

这篇关于为什么要在gensim word2vec中创建多个模型文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆