Python 3.4.0 - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） - Unix 14.04 [英] Python 3.4.0 -- 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) -- Unix 14.04

查看：127 发布时间：2017/8/16 20:49:38 python encoding utf-8 ascii lxml

本文介绍了Python 3.4.0 - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） - Unix 14.04的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试使用urlib和lxml从网络中检索一些数据，我有一个错误，不知道如何解决它。

  url ='http：//sum.in.ua/？swrd =автор'
 page = urllib.request.urlopen（url）
  / pre> 
 
 错误本身：
  UnicodeEncodeError：'ascii'编解码器不能编码位置11-15中的字符：序号不在范围（128）
  
这次在使用乌克兰语的API中，但是当我使用API（没有任何乌克兰语字母）时）：
  url = http://www.toponymic-dictionary.in.ua/index.php?option=com_content&view=section&layout=blog&id=8&Itemid=9
 page = urllib.request.urlopen（ url）
 pageWritten = page.read（）
 pageReady = pageWritten.decode（'utf-8'）
 xmldata = lxml.html.document_fromstring（pageReady）
 text1 = xmldata .xpath（'// p [@ class =MsoNormal] // text（）'）
  
 
 
 
解决方案
 URL只能使用可打印ASCII码点的子集;必须使用网址百分比编码对其他内容进行正确编码。
 
 
 你可以通过让Python处理你的参数来最好地实现。   urllib.parse.urlencode（ ）功能可以转换用于URL的字典（或键值对序列）：
  from urllib.parse import urlencode 
 
 url ='http://sum.in.ua/'
参数= {'swrd '：'автор'} 
 url ='{}？{}'。format（url，urlencode（parameters））
 
 page = urllib.request.urlopen（url）
  
这将首先将参数编码为UTF-8字节，然后将这些字节转换为百分号编码序列： p> 
 
 
 >>>来自urllib.parse import urlencode 
>>>> parameters = {'swrd'：'автор'} 
>>> urlencode（参数）
'swrd =％D0％B0％D0％B2％D1％82％D0％BE％D1％80'
  
如果您没有自己构建此URL，则需要修复编码。您可以拆分查询字符串，将其解析成字典，然后将其传递给 urlencode ，使用  urllib.parse.urlparse（） 和  urllib.parse.parse_qs （） ：
  from urllib.parse import urlparse，parse_qs，urlencode 
 
 url ='http://sum.in.ua/?swrd=автор'
 parsed_url = urlparse（url）
参数= parse_qs（parsed_url.query）
 url = parsed_url._replace（query = urlencode（parameters，doseq = True））geturl（）
  
这将URL分解成其组成部分，解析查询字符串，然后重新编码并重新构建URL：
  >>>来自urllib.parse import urlparse，parse_qs，urlencode 
>>> url ='http://sum.in.ua/?swrd=автор'
>>> parsed_url = urlparse（url）
>>> parameters = parse_qs（parsed_url.query）
>>> parsed_url._replace（query = urlencode（parameters，doseq = True））geturl（）
'http://sum.in.ua/?swrd=%D0%B0%D0%B2%D1%82% D0％BE％D1％80'
  
 
Trying to retrieve some data from the web using urlib and lxml, I've got an error and have no idea, how to fix it.
url='http://sum.in.ua/?swrd=автор'
page = urllib.request.urlopen(url)
The error itself:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128)
I'm using Ukrainian in API this time, but when I use API (without any Ukrainian letters in it) here:
url="http://www.toponymic-dictionary.in.ua/index.php?option=com_content&view=section&layout=blog&id=8&Itemid=9"
page = urllib.request.urlopen(url)
pageWritten = page.read()
pageReady = pageWritten.decode('utf-8')
xmldata = lxml.html.document_fromstring(pageReady)
text1 = xmldata.xpath('//p[@class="MsoNormal"]//text()')
it gets me the data in Ukrainian and everything works just fine.
 解决方案 
URLs can only use a subset of printable ASCII codepoints; everything else must be properly encoded using URL percent encoding.

You can best achieve that by letting Python handle your parameters. The urllib.parse.urlencode() function can convert a dictionary (or a sequence of key-value pairs) for use in URLs:
from urllib.parse import urlencode

url = 'http://sum.in.ua/'
parameters = {'swrd': 'автор'}
url = '{}?{}'.format(url, urlencode(parameters))

page = urllib.request.urlopen(url)
This will first encode the parameters to UTF-8 bytes, then convert those bytes to percent-encoding sequences:
>>> from urllib.parse import urlencode
>>> parameters = {'swrd': 'автор'}
>>> urlencode(parameters)
'swrd=%D0%B0%D0%B2%D1%82%D0%BE%D1%80'
If you did not construct this URL yourself, you'll need to 'repair' the encoding. You can split of the query string, parse it into a dictionary, then pass it to urlencode to put it back into the URL using urllib.parse.urlparse() and urllib.parse.parse_qs():
from urllib.parse import urlparse, parse_qs, urlencode

url = 'http://sum.in.ua/?swrd=автор'
parsed_url = urlparse(url)
parameters = parse_qs(parsed_url.query)
url = parsed_url._replace(query=urlencode(parameters, doseq=True)).geturl()
This splits the URL into its constituent parts, parses out the query string, re-encodes and re-builds the URL afterwards:
>>> from urllib.parse import urlparse, parse_qs, urlencode
>>> url = 'http://sum.in.ua/?swrd=автор'
>>> parsed_url = urlparse(url)
>>> parameters = parse_qs(parsed_url.query)
>>> parsed_url._replace(query=urlencode(parameters, doseq=True)).geturl()
'http://sum.in.ua/?swrd=%D0%B0%D0%B2%D1%82%D0%BE%D1%80'


                        
这篇关于Python 3.4.0  - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） -  Unix 14.04的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 3.4.0 - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） - Unix 14.04 [英] Python 3.4.0 -- 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) -- Unix 14.04

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 3.4.0 - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） - Unix 14.04 [英] Python 3.4.0 -- &#39;ascii&#39; codec can&#39;t encode characters in position 11-15: ordinal not in range(128) -- Unix 14.04

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python 3.4.0 - 'ascii'编解码器无法编码位置11-15中的字符：序号不在范围（128） - Unix 14.04 [英] Python 3.4.0 -- 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) -- Unix 14.04

登录关闭