urllib2开头提供错误的字符集 [英] urllib2 opener providing wrong charset
问题描述
当我打开网址阅读它,我不能认出它。但是当我检查内容头,它说它被编码为utf-8。所以我试图将其转换为unicode,并抱怨UnicodeDecodeError:'ascii'编解码器无法解码字节0x8b在位置1:序数不在范围(128)使用unicode()。
When I open the url and read it, I can't recognize it. But when I check the content header it says it is encoded as utf-8. So I tried to convert it to unicode and it complained UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128) using unicode().
.encode(utf-8)产生
UnicodeDecodeError:'ascii'编解码器无法解码位置1中的字节0x8b:在范围(128)
.encode("utf-8") produces
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
.decode(utf-8)生成
UnicodeDecodeError:'utf8'编解码器无法解码字节0x8b位置1:无效的起始字节。
.decode("utf-8") produced UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte.
我试过了一切我可以想到的(我不是很好的编码)
I have tried everything I can come up with(I'm not that good at encodings)
如果我能得到这个工作,我会很高兴。感谢。
I would be happy if I could get this to work. Thanks.
推荐答案
这是一个常见的错误。服务器发送gzip压缩的流。
This is a common mistake. The server sends gzipped stream.
您应该首先解压缩:
response = opener.open(self.__url, data)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO.StringIO( response.read())
gzip_f = gzip.GzipFile(fileobj=buf)
content = gzip_f.read()
else:
content = response.read()
这篇关于urllib2开头提供错误的字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!