LookupError:未找到资源“语料库/停用词” [英] LookupError: Resource 'corpora/stopwords' not found
问题描述
其中一个文件具有以下标题:
import nltk ,json,运算符
from nltk.corpus从nltk.tokenize导入停用词
import RegexpTokenizer
当带有停用词代码的网页被调用时,它会产生以下错误:
$ b $ pre $ LookupError:
* ************************************************** *******************
找不到资源语料库/停用词。请使用NLTK
Downloader获取资源:>>>
- '/ app / nltk_data'
- '/ usr / share / nltk_data'
- '/ usr / local / share / nltk_data '
- '/ usr / lib / nltk_data'
- '/ usr / local / lib / nltk_data'
***************** ************************************************** ***
使用的确切代码:
#remove标点符号
toker = RegexpTokenizer(r'((?<= [^ \ w \s])\w(?= [^ \ w \s])|(\W))+',gaps = True)
data = toker.tokenize(data)
#remove停用词和数字
stopword = stopwords.words('english')
data = [w for w in data if w not in stopword and not w.isdigit()]
$ b $ < stopword = stopwords.words('english')
时,Heroku上的webapp不会产生查找错误注释掉。 代码在本地计算机上运行时没有出现故障。我已经在我的电脑上安装了所需的库,使用
pip install requirements.txt
当我测试我的电脑上的代码时,Heroku提供的虚拟环境正在运行。
我也尝试了由两个不同来源提供的NLTK,但是 LookupError
仍然存在。我使用的两个来源是:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git
问题是语料库(在这种情况下是停用词)不会上传到Heroku。你的代码在你的本地机器上工作,因为它已经有了NLTK语料库。请按照以下步骤解决问题。
$ b
- 在您的项目中创建一个新的目录(我们称之为'nltk_data')
- 该目录中的NLTK语料库。您将不得不在下载过程中进行配置。
- 告诉nltk去寻找这个特定的路径。只要将
nltk.data.path.append('path_to_nltk_data')
添加到实际使用nltk的Python文件中即可。 应用程序到Heroku。
希望解决这个问题。为我工作!
I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).
One of the file has the following header:
import nltk, json, operator
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
When the webpage with the stopwords code is called, it produces the following error:
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
The exact code used:
#remove punctuation
toker = RegexpTokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\W))+', gaps=True)
data = toker.tokenize(data)
#remove stop words and digits
stopword = stopwords.words('english')
data = [w for w in data if w not in stopword and not w.isdigit()]
The webapp on Heroku doesn't produce the Lookup error when stopword = stopwords.words('english')
is commented out.
The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using
pip install requirements.txt
The virtual environment provided by Heroku was running when I tested the code on my computer.
I have also tried the NLTK provided by two different sources, but the LookupError
is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git
The problem is that the corpus ('stopwords' in this case) doesn't get uploaded to Heroku. Your code works on your local machine because it already has the NLTK corpus. Please follow these steps to solve the issue.
- Create a new directory in your project (let's call it 'nltk_data')
- Download the NLTK corpus in that directory. You will have to configure that during the download.
- Tell nltk to look for this particular path. Just add
nltk.data.path.append('path_to_nltk_data')
to the Python file that's actually using nltk. - Now push the app to Heroku.
Hope that solves the problem. Worked for me!
这篇关于LookupError:未找到资源“语料库/停用词”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!