python urllib2 和 unicode [英] python urllib2 and unicode

查看:48
本文介绍了python urllib2 和 unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从搜索引擎给出的结果中收集信息.但是查询部分只能写文本,不能写unicode.

导入 urllib2a = "바둑"a = a.decode("utf-8")类型(一)#Out[35]: Unicodeurl = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)url2 = urllib2.urlopen(url)

给出这个错误

#UnicodeEncodeError: 'ascii' 编解码器无法对位置 39-40 中的字符进行编码:序号不在范围内 (128)

解决方案

将 Unicode 数据编码为 UTF-8,然后进行 URL 编码:

from urllib import urlencode导入 urllib2params = {'where': 'nexearch', 'query': a.encode('utf8')}参数 = urlencode(参数)url = "http://search.naver.com/search.naver?"+ 参数响应 = urllib2.urlopen(url)

演示:

<预><代码>>>>从 urllib 导入 urlencode>>>a = u"바둑">>>params = {'where': 'nexearch', 'query': a.encode('utf8')}>>>参数 = urlencode(参数)>>>参数'查询=%EB%B0%94%EB%91%91&where=nexearch'>>>url = "http://search.naver.com/search.naver?"+ 参数>>>网址'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'

使用urllib.urlencode() 构建参数更容易,但您也可以使用 urllib.quote_plus():

from urllib import quote_plusencoding_a = quote_plus(a.encode('utf8'))url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encrypted_a

I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.

import urllib2
a = "바둑"
a = a.decode("utf-8")
type(a)
#Out[35]: unicode

url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
url2 = urllib2.urlopen(url)

give this error

#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)

解决方案

Encode the Unicode data to UTF-8, then URL-encode:

from urllib import urlencode
import urllib2

params = {'where': 'nexearch', 'query': a.encode('utf8')}
params = urlencode(params)

url = "http://search.naver.com/search.naver?" + params
response = urllib2.urlopen(url)

Demo:

>>> from urllib import urlencode
>>> a = u"바둑"
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
>>> params = urlencode(params)
>>> params
'query=%EB%B0%94%EB%91%91&where=nexearch'
>>> url = "http://search.naver.com/search.naver?" + params
>>> url
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'

Using urllib.urlencode() to build the parameters is easier, but you can also just escape the query value with urllib.quote_plus():

from urllib import quote_plus
encoded_a = quote_plus(a.encode('utf8'))
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a

这篇关于python urllib2 和 unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆