NLTK v3.2:无法 nltk.pos_tag() [英] NLTK v3.2: Unable to nltk.pos_tag()

查看:38
本文介绍了NLTK v3.2:无法 nltk.pos_tag()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨文本挖掘冠军,

我在 Windows 10 上使用带有 NLTK v3.2 的 Anaconda.(客户端环境)

I'm using Anaconda with NLTK v3.2 on Windows 10.(client's environment)

当我尝试 POS 标记时,我不断收到 URLLIB2 错误:

When I try to POS tag, I keep getting a URLLIB2 error:

URLError: <urlopen error unknown url type: c>

urllib2 似乎无法识别 windows 路径?我该如何解决这个问题?

It seems urllib2 is unable to recognize windows paths? How can I work around this?

命令很简单:

nltk.pos_tag(nltk.word_tokenize("Hello World"))

有一个重复的问题,但我认为 manan 和 alvas 在这里获得的答案是一个更好的解决方案.

edit: There is a duplicate question, however I think the answers obtained here by manan and alvas are a better fix.

推荐答案

已编辑

此问题已从 NLTK v3.2.1 解决.升级您的 NLTK 版本将解决该问题,例如pip install -U nltk.

我遇到了同样的问题,遇到的错误如下;

I faced the same issue and the error encountered was as follows;

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltk	ag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltk	agperceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltk	agperceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkdata.py", line 801, in load
opened_resource = _open(resource_url)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkdata.py", line 924, in _open
return urlopen(resource_url)
  File "C:Python27liburllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
  File "C:Python27liburllib2.py", line 391, in open
response = self._open(req, data)
  File "C:Python27liburllib2.py", line 414, in _open
'unknown_open', req)
  File "C:Python27liburllib2.py", line 369, in _call_chain
result = func(*args)
  File "C:Python27liburllib2.py", line 1206, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>

您提到的 URLError 是由于 Windows NLTK 库中的 perceptron.py 文件中的错误造成的.在我的机器上,文件在这个位置

The URLError that you mentioned was due to a bug in the perceptron.py file within the NLTK library for Windows. In my machine, the file is at this location

C:Python27Libsite-packages
ltk-3.2-py2.7.egg
ltk	agperceptron.py

(基本上,无论您有 Python27 文件夹,都可以查看您文件夹中的等效位置)

(Basically look at an equivalent location within yours wherever you have the Python27 folder)

该错误基本上在代码中,用于在您的机器中找到averaged_perceptron_tagger 的相应位置.关于这一点,可以看一下data.py文件中提到的801和924行.

The bug was basically in the code to find the corresponding location for the averaged_perceptron_tagger within your machine. One can have a look at the line 801 and 924 mentioned in the data.py file regarding this.

我认为 NLTK 开发者社区最近修复了代码中的这个错误.看看几天前对他们的代码所做的提交.

I think the NLTK developer community recently fixed this bug in the code. Have a look at this commit made to their code a few days back.

https://github.com/nltk/nltk/commit/d3de14e58215beebdccc7b76c044109f6197d1d9#diff-26b258372e0d13c2543de8dbb1841252

进行更改的代码段如下;

The snippet where the change was made is as follows;

self.tagdict = {}
self.classes = set()
    if load:
        AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
          self.load(AP_MODEL_LOC)
        # Initially it was:AP_MODEL_LOC = str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) 

def tag(self, tokens):

将文件更新到最近的提交对我有用,并且能够使用 nltk.pos_tag 命令.我相信这也能解决您的问题(假设您已设置好其他所有内容).

Updating the file to the most recent commit worked for me and was able to use the nltk.pos_tag command. I believe this would resolve your problem as well (assuming you have everything else set up).

这篇关于NLTK v3.2:无法 nltk.pos_tag()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆