python urllib2 utf-8编码 [英] python urllib2 utf-8 encoding

查看:259
本文介绍了python urllib2 utf-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



b



片段:

  opener = urllib2.build_opener()
opener.addheaders = [('代理','Mozilla / 5.0')]
opener.addheaders = [('Accept-Charset','utf-8')]
f = opener.open(url)
doc = f.read()。decode('utf-8')

服务器响应是: f.info())

 内容类型:text / html; charset = UTF-8 

但是我收到错误:

  UnicodeDecodeError:'utf8'codec无法解码字节[...]:无效的连续字节

这里有什么问题?

解决方案

尝试使用' 1'看看它的样子。您看到的是表示UTF-8解码错误(请参阅 UnicodeDecodeError,无效的继续字节



如果您发布了 list(f.read())[:100] 所以我们可以看到数据。



FYI,把# - * - 编码:utf-8 - * - 与您的问题无关。 编码是指您的python脚本本身的编码,而不是其处理的数据: - )


okay, I have: # -*- coding: utf-8 -*- in my python file.

the snippet:

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.addheaders = [('Accept-Charset', 'utf-8')]
f =opener.open(url)
doc = f.read().decode('utf-8')

The server response is: (via f.info())

Content-Type: text/html; charset=UTF-8

but i get the error:

UnicodeDecodeError: 'utf8' codec can't decode byte[...]: invalid continuation byte

What's wrong here?

解决方案

Try decoding the data using 'latin-1' to see what it looks like. What you're seeing indicates a UTF-8 decode error (see UnicodeDecodeError, invalid continuation byte ).

It would be helpful if you posted the result of list(f.read())[:100] so we can see the data.

FYI, putting # -*- coding: utf-8 -*- is unrelated to your issue. That encoding refers to the encoding of your python script itself, not the data it is handling :-)

这篇关于python urllib2 utf-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆