尝试下载gzip文件时出现urlopen麻烦 [英] urlopen trouble while trying to download a gzip file

查看:109
本文介绍了尝试下载gzip文件时出现urlopen麻烦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将使用wiktionary转储进行POS标记.下载时莫名其妙地卡住了.这是我的代码:

I am going to use the wiktionary dump for the purpose of POS tagging. Somehow it gets stuck when downloading. Here is my code:

import nltk
from urllib import urlopen
from collections import Counter
import gzip

url = 'http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-all-titles-in-ns0.gz'
fStream = gzip.open(urlopen(url).read(), 'rb')
dictFile = fStream.read()
fStream.close()

text = nltk.Text(word.lower() for word in dictFile())
tokens = nltk.word_tokenize(text)

这是我得到的错误:

Traceback (most recent call last):
File "~/dir1/dir1/wikt.py", line 15, in <module>
fStream = gzip.open(urlopen(url).read(), 'rb')
File "/usr/lib/python2.7/gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "/usr/lib/python2.7/gzip.py", line 89, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
Process finished with exit code 1

推荐答案

您正在将下载的数据传递给gzip.open(),后者希望传递的是文件名.

You are passing the downloaded data to gzip.open(), which expects to be passed a filename instead.

然后,代码尝试打开由压缩后的数据命名的文件名,但失败.

The code then tries to open a filename named by the gzipped data, and fails.

将URL数据保存到文件中,然后在那个上使用gzip.open(),或者使用zlib模块解压缩压缩后的数据. 保存"数据就像使用StringIO.StringIO()内存文件对象一样简单:

Either save the URL data to a file, then use gzip.open() on that, or decompress the gzipped data using the zlib module instead. 'Saving' the data can be as easy as using a StringIO.StringIO() in-memory file object:

from StringIO import StringIO
from urllib import urlopen
import gzip


url = 'http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-all-titles-in-ns0.gz'
inmemory = StringIO(urlopen(url).read())
fStream = gzip.GzipFile(fileobj=inmemory, mode='rb')

这篇关于尝试下载gzip文件时出现urlopen麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆