LookupError:未找到资源“语料库/停用词” [英] LookupError: Resource 'corpora/stopwords' not found

查看:437
本文介绍了LookupError:未找到资源“语料库/停用词”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用Flask在Heroku上运行一个webapp。 Web应用程序在Python中使用NLTK(自然语言工具包库)进行编程。



其中一个文件具有以下标题:

  import nltk ,json,运算符
from nltk.corpus从nltk.tokenize导入停用词
import RegexpTokenizer

当带有停用词代码的网页被调用时,它会产生以下错误:
$ b $ pre $ LookupError:
* ************************************************** *******************
找不到资源语料库/停用词。请使用NLTK
Downloader获取资源:>>>
- '/ app / nltk_data'
- '/ usr / share / nltk_data'
- '/ usr / local / share / nltk_data '
- '/ usr / lib / nltk_data'
- '/ usr / local / lib / nltk_data'
***************** ************************************************** ***

使用的确切代码:

  #remove标点符号
toker = RegexpTokenizer(r'((?<= [^ \ w \s])\w(?= [^ \ w \s])|(\W))+',gaps = True)
data = toker.tokenize(data)

#remove停用词和数字
stopword = stopwords.words('english')
data = [w for w in data if w not in stopword and not w.isdigit()]
$ b $ < stopword = stopwords.words('english')时,Heroku上的webapp不会产生查找错误注释掉。



代码在本地计算机上运行时没有出现故障。我已经在我的电脑上安装了所需的库,使用

  pip install requirements.txt 

当我测试我的电脑上的代码时,Heroku提供的虚拟环境正在运行。

我也尝试了由两个不同来源提供的NLTK,但是 LookupError 仍然存在。我使用的两个来源是:

http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip

https://github.com/nltk/nltk.git

解决方案

问题是语料库(在这种情况下是停用词)不会上传到Heroku。你的代码在你的本地机器上工作,因为它已经有了NLTK语料库。请按照以下步骤解决问题。
$ b


  1. 在您的项目中创建一个新的目录(我们称之为'nltk_data')

  2. 该目录中的NLTK语料库。您将不得不在下载过程中进行配置。

  3. 告诉nltk去寻找这个特定的路径。只要将 nltk.data.path.append('path_to_nltk_data')添加到实际使用nltk的Python文件中即可。 应用程序到Heroku。

希望解决这个问题。为我工作!


I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).

One of the file has the following header:

import nltk, json, operator
from nltk.corpus import stopwords 
from nltk.tokenize import RegexpTokenizer 

When the webpage with the stopwords code is called, it produces the following error:

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK  
  Downloader to obtain the resource:  >>> nltk.download()  
  Searched in:  
    - '/app/nltk_data'  
    - '/usr/share/nltk_data'  
    - '/usr/local/share/nltk_data'  
    - '/usr/lib/nltk_data'  
    - '/usr/local/lib/nltk_data'  
**********************************************************************

The exact code used:

#remove punctuation  
toker = RegexpTokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\W))+', gaps=True) 
data = toker.tokenize(data)  

#remove stop words and digits 
stopword = stopwords.words('english')  
data = [w for w in data if w not in stopword and not w.isdigit()]  

The webapp on Heroku doesn't produce the Lookup error when stopword = stopwords.words('english') is commented out.

The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using

pip install requirements.txt  

The virtual environment provided by Heroku was running when I tested the code on my computer.

I have also tried the NLTK provided by two different sources, but the LookupError is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git

解决方案

The problem is that the corpus ('stopwords' in this case) doesn't get uploaded to Heroku. Your code works on your local machine because it already has the NLTK corpus. Please follow these steps to solve the issue.

  1. Create a new directory in your project (let's call it 'nltk_data')
  2. Download the NLTK corpus in that directory. You will have to configure that during the download.
  3. Tell nltk to look for this particular path. Just add nltk.data.path.append('path_to_nltk_data') to the Python file that's actually using nltk.
  4. Now push the app to Heroku.

Hope that solves the problem. Worked for me!

这篇关于LookupError:未找到资源“语料库/停用词”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆