Python使用urllib抓取汉字乱码

查看:459
本文介绍了Python使用urllib抓取汉字乱码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问 题

# translation.py
import urllib.request
import urllib.parse


url = 'http://fanyi.baidu.com/v2transapi'
data = {}


data['from'] = 'en'
data['to'] = 'zh'
data['query'] = 'Most solar heating systems use large aluminum or alloy sheets, painted black to absorb the sun\'s heat. '
data['transtype'] = 'realtime'
data['simple_means_flag'] = '3'


data = urllib.parse.urlencode(data).encode('utf-8')



response = urllib.request.urlopen(url, data)


html = response.read().decode('utf-8')


with open('fanyi.txt', 'w', encoding='utf-8') as fanyi:
    fanyi.write(html)
print(html)

这样的代码运行之后发现返回的汉字全部变成unicode代码:\u5927\u591a\u6570\u592a\u9633\u80fd\u52a0\u70ed\u7cfb\u7edf\u4f7f\u7528\u5927\u7684\u94dd\u6216\u5408\u91d1\u677f\uff0c\u6d82\u4e0a\u9ed1\u8272\u4ee5\u5438\u6536\u592a\u9633\u7684\u70ed\u91cf\u3002

请问如何才能修改代码,使汉字正常显示?

解决方案

这是一段 json 结构的数据,所以需要进行一些处理

import urllib.request
import urllib.parse
import json


url = 'http://fanyi.baidu.com/v2transapi'
data = {}


data['from'] = 'en'
data['to'] = 'zh'
data['query'] = 'Most solar heating systems use large aluminum or alloy sheets, painted black to absorb the sun\'s heat. '
data['transtype'] = 'realtime'
data['simple_means_flag'] = '3'


data = urllib.parse.urlencode(data).encode('utf-8')



response = urllib.request.urlopen(url, data)


html = response.read().decode('utf-8')
_html = json.loads(html)


with open('fanyi.txt', 'w', encoding='utf-8') as fanyi:
    fanyi.write(html)
print(_html)

这篇关于Python使用urllib抓取汉字乱码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆