使用 PyInstaller 打包时找不到 SpaCy 模型 [英] Can't find SpaCy model when packaging with PyInstaller

查看:116
本文介绍了使用 PyInstaller 打包时找不到 SpaCy 模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 PyInstaller 将一个 python 脚本打包成一个 .exe.此脚本使用 spacy 加载以下模型:en_core_web_sm.我已经运行 python -m spacy download en_core_web_sm 在本地下载模型.问题是当 PyInstaller 尝试打包我的脚本时,它找不到模型.我收到以下错误:找不到模型en_core_web_sm".它似乎不是 Python 包或数据目录的有效路径. 我想这可能意味着我需要在我的 Python 脚本中运行下载命令以确保它具有模型,但是如果我让我的脚本下载模型,它只会说要求已经得到满足.我还有一个钩子文件,用于处理引入隐藏的导入,并且应该也引入模型:

I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can't find the model. I get the following error: Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. I thought maybe this meant that I needed to run the download command in my python script in order to make sure it has the model, but if I have my script download the model it just says the requirements are already satisfied. I also have a hook file that handles bringing in hidden imports and is supposed to bring in the model as well:

from PyInstaller.utils.hooks import collect_all, collect_data_files

datas = []
datas.extend(collect_data_files('en_core_web_sm'))

# ----------------------------- SPACY -----------------------------
data = collect_all('spacy')

datas.extend(data[0])
binaries = data[1]
hiddenimports = data[2]

# ----------------------------- THINC -----------------------------
data = collect_all('thinc')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- BLIS -----------------------------

data = collect_all('blis')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- STDNUM -----------------------------

data = collect_all('stdnum')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- OTHER -------------------------------

hiddenimports += ['srsly.msgpack.util']

我使用以下代码下载模型,然后使用 PyInstaller 打包脚本:

I use the following code to download the model and then to package the script with PyInstaller:

os.system('python -m spacy download en_core_web_sm')
PyInstaller.__main__.run([path_to_script, '--onefile', '--additional-hooks-dir=.'])

hook-spacy.py 脚本与 PyInstaller 正在打包的脚本位于同一目录中.

The hook-spacy.py script is in the same directory as the script that is being packaged by PyInstaller.

如果我在本地运行脚本,所有这些都有效.它会找到它应该的模型.如果我尝试使用 PyInstaller 打包脚本并尝试运行 .exe,我只会收到此错误.

All of this works if I run the script locally. It finds the model as it should. I only get this error if I try to package the script with PyInstaller and try to run the .exe.

我使用的是 Python v3.8.7、PyInstaller v4.2 和带有 en_core_web_sm v3.0.0 的 spacy v3.0.3

I am using Python v3.8.7, PyInstaller v4.2, and spacy v3.0.3 with en_core_web_sm v3.0.0

推荐答案

当您在此处使用 PyInstaller 将数据文件收集到包中时,这些文件实际上被编译为生成的 exe 本身.在评估导入语句时,PyInstaller 会为 Python 代码透明地处理这一点.

When you use PyInstaller to collect data files into the bundle as you are doing here, the files are actually compiled into the resulting exe itself. This is transparently handled for Python code by PyInstaller when import statements are evaluated.

但是,对于数据文件,您必须自己处理.例如,spacy 很可能在当前工作目录中寻找模型.它不会找到您的模型,因为它被编译到 .exe 中,因此不存在于当前工作目录中.

However, for data files you must handle this yourself. For instance, spacy is likely looking for the model in the current working directory. It won’t find your model because it is compiled into the .exe instead and therefore isn’t present in the current working directory.

您将需要使用此 API:

You will need to use this API:

https://pyinstaller.readthedocs.io/en/stable/spec-files.html#using-data-files-from-a-module

这允许您从 PyInstaller 创建的 exe 中读取数据文件.然后可以将其写入当前工作目录,然后 spacy 应该可以找到它.

This allows you to read a data file from the exe that PyInstaller creates. You can then write it to the current working directory and then spacy should be able to find it.

这篇关于使用 PyInstaller 打包时找不到 SpaCy 模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆