具有UnicodeEncodeError的国家字符的URL [英] URL with national characters giving UnicodeEncodeError

查看：152 发布时间：2016/11/19 17:03:32 python character-encoding

本文介绍了具有UnicodeEncodeError的国家字符的URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要提取字典条目：

  url ='http://www.lingvo.ua/uk /解释/ uk-ru /вікно'
＃parsed_url = urlparse（url）
＃parameters = parse_qs（parsed_url.query）
＃url = parsed_url._replace（query = urlencode doseg = True））。geturl（）
 page = urllib.request.urlopen（url）
 pageWritten = page.read（）
 pageReady = pageWritten.decode（'utf-8'） 
 xmldata = lxml.html.document_fromstring（pageReady）
 text = xmldata.xpath（// div [@ class =js-article-html g-card]）

要么打开或关闭注释行，它会收到错误：

  UnicodeEncodeError：'ascii'编解码器无法编码位置24-28中的字符：序数不在范围内（128）
   urllib.parse.quote（string）在Python 3或 urllib.quote（string） in Python 2。
 ＃Python 3 
 import urllib.parse 
 url ='http://www.lingvo.ua'+ urllib.parse。引用（'/ uk / Interpret / uk-ru /вікно'）
 
＃Python 2 
 import urllib 
 url ='http://www.lingvo.ua'+ urllib.quote（u'/ uk / Interpret / uk-ru /вікно'.encode（'UTF-8'））
  
 
 b $ b 
注意：根据对Unicode字符进行URL编码的正确方法是什么？，应对网址进行编码作为UTF-8。但是，这并不排除对生成的非ASCII，UTF-8字符进行百分比编码。
 
I'm trying to extract dictionary entry:
url = 'http://www.lingvo.ua/uk/Interpret/uk-ru/вікно'
# parsed_url = urlparse(url)
# parameters = parse_qs(parsed_url.query)
# url = parsed_url._replace(query=urlencode(parameters, doseq=True)).geturl()
page = urllib.request.urlopen(url)
pageWritten = page.read()
pageReady = pageWritten.decode('utf-8')
xmldata = lxml.html.document_fromstring(pageReady)
text = xmldata.xpath(//div[@class="js-article-html g-card"])
either with commented lines on or off, it keeps getting an error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-28: ordinal not in range(128)

 解决方案 
Your issue is that you have non-ASCII characters within your URL path which must be properly encoded using urllib.parse.quote(string) in Python 3 or urllib.quote(string) in Python 2.
# Python 3
import urllib.parse
url = 'http://www.lingvo.ua' + urllib.parse.quote('/uk/Interpret/uk-ru/вікно')

# Python 2
import urllib
url = 'http://www.lingvo.ua' + urllib.quote(u'/uk/Interpret/uk-ru/вікно'.encode('UTF-8'))
NOTE: According to What is the proper way to URL encode Unicode characters?, URLs should be encoded as UTF-8. However, that does not preclude percent encoding the resulting non-ASCII, UTF-8 characters.

                        这篇关于具有UnicodeEncodeError的国家字符的URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有UnicodeEncodeError的国家字符的URL [英] URL with national characters giving UnicodeEncodeError

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

具有UnicodeEncodeError的国家字符的URL [英] URL with national characters giving UnicodeEncodeError

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭