python中NLTK中的POS标记中的zip文件错误 [英] bad zip file error in POS tagging in NLTK in python

查看:214
本文介绍了python中NLTK中的POS标记中的zip文件错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python和NLTK的新手..我想在其中进行单词标记和POS标记.我在Ubuntu 14.04中安装了默认的python 2.7.6的Nltk 3.0,首先我尝试对一个简单的句子进行标记化但是,我收到一个错误消息,提示"BadZipfile:文件不是zip文件".如何解决此问题???

I am new to python and NLTK ..I want to do word tokenization and POS Tagging in this.I installed Nltk 3.0 in my Ubuntu 14.04 having a default python 2.7.6.First I tried to do tokenization of a simple sentence.But I am getting an error,telling that "BadZipfile: File is not a zip file".How to solve this????

..另一个疑问.当我安装Nltk数据时(使用命令行),我给出的路径为"/usr/share/nltk_data".由于某些错误而无法安装某些程序包.但是当我使用命令"nltk.data"进行检查时,它显示了其他路径.路径"和其他路径实际上是无效的.为什么?

..One more doubt..i.e. i gave path as "/usr/share/nltk_data" when i installed Nltk data (using command line).Some of the pakages couldnt be installed due to some errors.But it shows other paths when i cheked using command "nltk.data.path" and the other paths are invalid actually.. why???

我有1000个文本文件.如何编写一个用于标记和POS标记的程序,将这些文件作为python输入..我不知道..请帮助我...

I have got 1000 text files.How to code a program for tokenization and POS tagging for this much files together as a input in python..i dont know.. Please help me...

下面以相同的顺序给出了我在python解释器中使用命令的方式

The way I used commands in python interpretter, is given below in the same order below

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import nltk
>>> nltk.data.path
['/home/ubuntu/nltk_data', '/usr/share/nltk_data',       '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
>>> from nltk import pos_tag, word_tokenize
>>> sentence = "Hello my name is Derek. I live in Salt Lake city."
>>> sentence
'Hello my name is Derek. I live in Salt Lake city.'
>>> word_tokenize(sentence)

Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
word_tokenize(sentence)
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/__init__.py",      line 93, in word_tokenize
return [token for sent in sent_tokenize(text)
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/__init__.py",   line 81, in sent_tokenize
tokenizer = load('tokenizers/punkt/english.pickle')
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 774, in   load
opened_resource = _open(resource_url)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 888, in _open
return find(path_, path + ['']).open()
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 605, in find
return find(modified_name, paths)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 592, in find
return ZipFilePathPointer(p, zipentry)
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 449, in __init__
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 946, in __init__
zipfile.ZipFile.__init__(self, filename)
File "/usr/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file
>>>

预先感谢.....

推荐答案

您显然尚未成功运行download_corpora.py.

这篇关于python中NLTK中的POS标记中的zip文件错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆