NLTK和停用词失败#lookuperror [英] NLTK and Stopwords Fail #lookuperror

查看:130
本文介绍了NLTK和停用词失败#lookuperror的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试启动一个情绪分析项目,我将使用停用词方法.我进行了一些研究,发现nltk有停用词,但是当我执行命令时会出现错误.

I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.

下面是我的操作,以便了解nltk所使用的单词(例如,您可能在此处找到的内容 http://www.nltk.org/book/ch02.html 在第4.1节中):

What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.html in section4.1):

from nltk.corpus import stopwords
stopwords.words('english')

但是当我按Enter时,我会得到

But when I press enter I obtain

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************

而且,由于这个问题,这样的事情无法正常运行(获得相同的错误):

And, because of this problem things like this cannot run properly (obtaining the same error):

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

您知道可能是什么问题吗?我必须使用西班牙语单词,您推荐另一种方法吗?我还认为将Goslate包与英语数据集一起使用

Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english

感谢阅读!

P.D .:我使用Ananconda

P.D.: I use Ananconda

推荐答案

您的计算机上似乎没有停用词语料库.

You don't seem to have the stopwords corpus on your computer.

您需要启动NLTK下载器并下载所需的所有数据.

You need to start the NLTK Downloader and download all the data you need.

打开Python控制台并执行以下操作:

Open a Python console and do the following:

>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/

在打开的GUI窗口中,只需按下载"按钮以下载所有语料库,或转到公司"标签,仅下载您需要/想要的那些.

In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.

这篇关于NLTK和停用词失败#lookuperror的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆