无法使用 Python 打开 Unicode URL [英] Can't open Unicode URL with Python
问题描述
使用 Python 2.5.2 和 Linux Debian,我试图从包含西班牙语字符 'í'
的西班牙语 URL 获取内容:
Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í'
:
import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url).read()
我收到此错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
我在将 url 传递给 urllib 之前尝试使用:
I've tried using before passing the url to urllib this:
url = urllib.quote(url)
还有这个:
url = url.encode('UTF-8')
但它们没有用.
你能告诉我我做错了什么吗?
Can you tell me what I am doing wrong ?
推荐答案
根据适用标准,RFC 1378,URL 只能包含 ASCII 字符.很好的解释这里,我引用:
Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:
"...只有字母数字 [0-9a-zA-Z],特殊字符$-_.+!*'(),"[不包括引号 - ed],和用于它们的保留字符可以使用保留用途在 URL 中未编码."
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
正如我给出的 URL 所解释的那样,这可能意味着您必须用%ED"替换带有重音符号的小写 i".
As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.
这篇关于无法使用 Python 打开 Unicode URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!