python urllib2 和 unicode [英] python urllib2 and unicode
本文介绍了python urllib2 和 unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从搜索引擎给出的结果中收集信息.但是查询部分只能写文本,不能写unicode.
导入 urllib2a = "바둑"a = a.decode("utf-8")类型(一)#Out[35]: Unicodeurl = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)url2 = urllib2.urlopen(url)
给出这个错误
#UnicodeEncodeError: 'ascii' 编解码器无法对位置 39-40 中的字符进行编码:序号不在范围内 (128)
解决方案
将 Unicode 数据编码为 UTF-8,然后进行 URL 编码:
from urllib import urlencode导入 urllib2params = {'where': 'nexearch', 'query': a.encode('utf8')}参数 = urlencode(参数)url = "http://search.naver.com/search.naver?"+ 参数响应 = urllib2.urlopen(url)
演示:
<预><代码>>>>从 urllib 导入 urlencode>>>a = u"바둑">>>params = {'where': 'nexearch', 'query': a.encode('utf8')}>>>参数 = urlencode(参数)>>>参数'查询=%EB%B0%94%EB%91%91&where=nexearch'>>>url = "http://search.naver.com/search.naver?"+ 参数>>>网址'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'使用urllib.urlencode()
构建参数更容易,但您也可以使用 urllib.quote_plus()
:
from urllib import quote_plusencoding_a = quote_plus(a.encode('utf8'))url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encrypted_a
I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.
import urllib2
a = "바둑"
a = a.decode("utf-8")
type(a)
#Out[35]: unicode
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
url2 = urllib2.urlopen(url)
give this error
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)
解决方案
Encode the Unicode data to UTF-8, then URL-encode:
from urllib import urlencode
import urllib2
params = {'where': 'nexearch', 'query': a.encode('utf8')}
params = urlencode(params)
url = "http://search.naver.com/search.naver?" + params
response = urllib2.urlopen(url)
Demo:
>>> from urllib import urlencode
>>> a = u"바둑"
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
>>> params = urlencode(params)
>>> params
'query=%EB%B0%94%EB%91%91&where=nexearch'
>>> url = "http://search.naver.com/search.naver?" + params
>>> url
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'
Using urllib.urlencode()
to build the parameters is easier, but you can also just escape the query
value with urllib.quote_plus()
:
from urllib import quote_plus
encoded_a = quote_plus(a.encode('utf8'))
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a
这篇关于python urllib2 和 unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文