无法使用 Python 打开 Unicode URL [英] Can't open Unicode URL with Python

查看:28
本文介绍了无法使用 Python 打开 Unicode URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Python 2.5.2 和 Linux Debian,我试图从包含西班牙语字符 'í' 的西班牙语 URL 获取内容:

Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':

import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url).read()

我收到此错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

我在将 url 传递给 urllib 之前尝试使用:

I've tried using before passing the url to urllib this:

url = urllib.quote(url)

还有这个:

url = url.encode('UTF-8')

但它们没有用.

你能告诉我我做错了什么吗?

Can you tell me what I am doing wrong ?

推荐答案

根据适用标准,RFC 1378,URL 只能包含 ASCII 字符.很好的解释这里,我引用:

Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:

"...只有字母数字 [0-9a-zA-Z],特殊字符$-_.+!*'(),"[不包括引号 - ed],和用于它们的保留字符可以使用保留用途在 URL 中未编码."

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

正如我给出的 URL 所解释的那样,这可能意味着您必须用%ED"替换带有重音符号的小写 i".

As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.

这篇关于无法使用 Python 打开 Unicode URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆