反编译来自Google的json对象时的编码错误 [英] Encoding error while deserializing a json object from Google
问题描述
作为练习,我构建了一个查询Google Suggest JSON API的小脚本。代码很简单:
query ='a'
url =http://clients1.google。 co.jp/complete/search?hl=ja&q=%s&json=t%query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError:'utf8'编解码器无法解码位置0的字节0x83:无效起始字节
如果我尝试 read()
响应对象,这就是我所得到的:
'[ 一,[ 亚马逊, ANA, AU, 苹果, 土坯, ALC,\x83A\x83} \x83] \ x83\x93\" , \x83A\x83\x81\x83u\x83\x8d, \x83A\x83X\x83N\x83\x8b,\x83A\\ ,,,,]]'
因此,当python尝试解码字符串时,它会引发错误引发。这只会发生在google.co.jp和日语中。我尝试使用不同的contry /语言的相同的代码,我不要得到同样的问题:当我尝试反序列化对象一切都可以。
- 我检查了响应标题,他们总是指定utf-8作为响应编码。
- 我使用在线解析器(http://json.parser.online.fr/)检查了JSON字符串,并再次检查所有接缝OK
<有什么想法来解决这个问题?什么使JSON
load()
function choke? 提前感谢
c $ c> print response.header )包含以下信息:
内容类型:文本/ JavaScript的; charset = Shift_JIS
注意字符集。
如果您在 json.load
中指定此编码,它将工作:
result = json.load(response,encoding ='shift_jis')
As an exercise I built a little script that query Google Suggest JSON API. The code is quite simple:
query = 'a'
url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte
If I try to read()
the response object, this is what I've got:
'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'
So it seams that the error is raised when python try to decode the string. This only happens with google.co.jp and the Japanese language. I tried the same code with different contry/languages and I do not get the same issue: when I try to deserialize the object everything works OK.
- I checked the response headers for and they always specify utf-8 as the response encoding.
- I checked the JSON string with an online parser (http://json.parser.online.fr/) and again all seams OK
Any ideas to solve this problem? What make the JSON load()
function choke?
Thanks in advance.
The response header (print response.header
) contains the following information:
Content-Type: text/javascript; charset=Shift_JIS
Note the charset.
If you specify this encoding in json.load
it will work:
result = json.load(response, encoding='shift_jis')
这篇关于反编译来自Google的json对象时的编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!