反编译来自Google的json对象时的编码错误 [英] Encoding error while deserializing a json object from Google

查看:205
本文介绍了反编译来自Google的json对象时的编码错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为练习,我构建了一个查询Google Suggest JSON API的小脚本。代码很简单:

  query ='a'
url =http://clients1.google。 co.jp/complete/search?hl=ja&q=%s&json=t%query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError:'utf8'编解码器无法解码位置0的字节0x83:无效起始字节

如果我尝试 read()响应对象,这就是我所得到的:

 '[ 一,[ 亚马逊, ANA, AU, 苹果, 土坯, ALC,\x83A\x83} \x83] \ x83\x93\" , \x83A\x83\x81\x83u\x83\x8d, \x83A\x83X\x83N\x83\x8b,\x83A\\ ,,,,]]'

因此,当python尝试解码字符串时,它会引发错误引发。这只会发生在google.co.jp和日语中。我尝试使用不同的contry /语言的相同的代码,我不要得到同样的问题:当我尝试反序列化对象一切都可以。




  • 我检查了响应标题,他们总是指定utf-8作为响应编码。

  • 我使用在线解析器(http://json.parser.online.fr/)检查了JSON字符串,并再次检查所有接缝OK



<有什么想法来解决这个问题?什么使JSON load() function choke?



提前感谢

解决方案

c $ c> print response.header )包含以下信息:

 内容类型:文本/ JavaScript的; charset = Shift_JIS 

注意字符集。



如果您在 json.load 中指定此编码,它将工作:

  result = json.load(response,encoding ='shift_jis')


As an exercise I built a little script that query Google Suggest JSON API. The code is quite simple:

query = 'a'
url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte

If I try to read() the response object, this is what I've got:

'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'

So it seams that the error is raised when python try to decode the string. This only happens with google.co.jp and the Japanese language. I tried the same code with different contry/languages and I do not get the same issue: when I try to deserialize the object everything works OK.

  • I checked the response headers for and they always specify utf-8 as the response encoding.
  • I checked the JSON string with an online parser (http://json.parser.online.fr/) and again all seams OK

Any ideas to solve this problem? What make the JSON load() function choke?

Thanks in advance.

解决方案

The response header (print response.header) contains the following information:

Content-Type: text/javascript; charset=Shift_JIS

Note the charset.

If you specify this encoding in json.load it will work:

result = json.load(response, encoding='shift_jis')

这篇关于反编译来自Google的json对象时的编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆