机械化的UnicodeDecodeError问题 [英] UnicodeDecodeError problem with mechanize

查看:67
本文介绍了机械化的UnicodeDecodeError问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过机械化从一个网站收到以下字符串:

I receive the following string from one website via mechanize:

'We\x92ve'

我知道\ x92代表字符.我正在尝试将该字符串转换为Unicode:

I know that \x92 stands for character. I'm trying to convert that string to Unicode:

>> unicode('We\x92ve','utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte

我在做什么错了?

我尝试使用"utf-8"的原因是:

The reason I tried 'utf-8' was this:

>> response = browser.response()
>> response.info()['content-type']
'text/html; charset=utf-8'

现在我看到我不能总是信任 content-type 标头.

Now I see I can't always trust content-type header.

推荐答案

\x92代表,但是它在Windows-1252编码中而不是在UTF-8中:

\x92 stands for alright, but it does so in the Windows-1252 encoding, not in UTF-8:

>>> print unicode('We\x92ve','1252')
We’ve

如果您不知道源数据采用的编码方式,则可以使用 chardet 进行检测. (极其易于使用).

If you don't know what encoding your source data is in, you can detect it using chardet (extremely easy to use).

这篇关于机械化的UnicodeDecodeError问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆