在 Heroku 上找不到资源“语料库/wordnet" [英] Resource 'corpora/wordnet' not found on Heroku
问题描述
我正在尝试让 NLTK 和 wordnet 在 Heroku 上工作.我已经做了
I'm trying to get NLTK and wordnet working on Heroku. I've already done
heroku run python
nltk.download()
wordnet
pip install -r requirements.txt
但我收到此错误:
Resource 'corpora/wordnet' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
然而,我查看了/app/nltk_data 并且它在那里,所以我不确定发生了什么.
Yet, I've looked at in /app/nltk_data and it's there, so I'm not sure what's going on.
推荐答案
我刚刚遇到了同样的问题.最终对我有用的是在应用程序的文件夹本身中创建一个nltk_data"目录,将语料库下载到该目录并在我的代码中添加一行,让 nltk 知道在该目录中查找.您可以在本地完成所有这些操作,然后将更改推送到 Heroku.
I just had this same problem. What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. You can do this all locally and then push the changes to Heroku.
因此,假设我的 Python 应用程序位于名为myapp/"的目录中
So, supposing my python application is in a directory called "myapp/"
第一步:创建目录
cd myapp/
mkdir nltk_data
第 2 步:将语料库下载到新目录
python -m nltk.downloader
这将弹出 nltk
下载器.将您的下载目录设置为whatever_the_absolute_path_to_myapp_is/nltk_data/
.如果您使用的是 GUI 下载器,则下载目录是通过 UI 底部的文本字段设置的.如果您使用命令行一,则在配置菜单中进行设置.
This'll pop up the nltk
downloader. Set your Download Directory to whatever_the_absolute_path_to_myapp_is/nltk_data/
. If you're using the GUI downloader, the download directory is set through a text field on the bottom of the UI. If you're using the command line one, you set it in the config menu.
一旦下载器知道指向您新创建的 nltk_data
目录,请下载您的语料库.
Once the downloader knows to point to your newly created nltk_data
directory, download your corpus.
或者从 Python 代码一步:
Or in one step from Python code:
nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")
第 3 步:让 nltk 知道去哪里寻找
ntlk
查找数据、资源等.在 nltk.data.path
变量中指定的位置.您需要做的就是将 nltk.data.path.append('./nltk_data/')
添加到实际使用 nltk 的 python 文件中,它将在其中查找语料库、标记器等除了默认路径.
ntlk
looks for data,resources,etc. in the locations specified in the nltk.data.path
variable. All you need to do is add nltk.data.path.append('./nltk_data/')
to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.
第 4 步:将其发送到 Heroku
git add nltk_data/
git commit -m 'super useful commit message'
git push heroku master
那应该可行!无论如何,它对我有用.值得注意的一件事是,从执行 nltk 内容的 python 文件到 nltk_data 目录的路径可能会有所不同,具体取决于您构建应用程序的方式,因此在执行 nltk.data.path.append 时只需考虑这一点('path_to_nltk_data')
That should work! It did for me anyway. One thing worth noting is that the path from the python file executing nltk stuff to the nltk_data directory may be different depending on how you've structured your application, so just account for that when you do nltk.data.path.append('path_to_nltk_data')
这篇关于在 Heroku 上找不到资源“语料库/wordnet"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!