是否有可用于 urllib.quote 和 urllib.unquote 在 Python 2.6.5 中的 unicode-ready 替代品? [英] Is there a unicode-ready substitute I can use for urllib.quote and urllib.unquote in Python 2.6.5?
问题描述
Python 的 urllib.quote
和 urllib.unquote
在 Python 2.6.5 中不能正确处理 Unicode.事情是这样的:
在 [5]: 打印 urllib.unquote(urllib.quote(u'Cataño'))---------------------------------------------------------------------------KeyError 回溯(最近一次调用最后一次)/home/kkinder/在 <module>()/usr/lib/python2.6/urllib.pyc 引用(s,安全)1222 safe_map[c] = (c in safe) and c or ('%%%02X' % i)第1223章->第 1224 章1225 返回 ''.join(res)1226KeyError: u'\xc3'
将值编码为 UTF8 也不起作用:
在 [6] 中:打印 urllib.unquote(urllib.quote(u'Cataño'.encode('utf8')))卡塔奥
它被认为是一个错误并且有一个修复,但不适用于我的Python 版本.
我想要的是类似于 urllib.quote/urllib.unquote 的东西,但正确处理 unicode 变量,这样这段代码就可以工作了:
decode_url(encode_url(u'Cataño')) == u'Cataño'
有什么建议吗?
Python 的 urllib.quote 和 urllib.unquote 不能正确处理 Unicode
urllib
根本不处理 Unicode.根据定义,URL 不包含非 ASCII 字符.当您处理 urllib
时,您应该只使用字节字符串.如果您希望这些字符代表 Unicode 字符,则必须手动对其进行编码和解码.
IRI 可以包含非 ASCII 字符,将它们编码为 UTF-8 序列,但 Python在这一点上,没有 irilib
.
将值编码为 UTF8 也不起作用:
在 [6]: 打印 urllib.unquote(urllib.quote(u'Cataño'.encode('utf8')))卡塔奥
啊,现在您正在控制台中输入 Unicode,并在控制台中执行 print
-Unicode.这通常是不可靠的,尤其是在 Windows 和您的情况下使用 IPython 控制台.
用反斜杠序列输入很长的路,你可以更容易地看到 urllib
位确实有效:
Python's urllib.quote
and urllib.unquote
do not handle Unicode correctly in Python 2.6.5. This is what happens:
In [5]: print urllib.unquote(urllib.quote(u'Cataño'))
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/home/kkinder/<ipython console> in <module>()
/usr/lib/python2.6/urllib.pyc in quote(s, safe)
1222 safe_map[c] = (c in safe) and c or ('%%%02X' % i)
1223 _safemaps[cachekey] = safe_map
-> 1224 res = map(safe_map.__getitem__, s)
1225 return ''.join(res)
1226
KeyError: u'\xc3'
Encoding the value to UTF8 also does not work:
In [6]: print urllib.unquote(urllib.quote(u'Cataño'.encode('utf8')))
Cataño
It's recognized as a bug and there is a fix, but not for my version of Python.
What I'd like is something similar to urllib.quote/urllib.unquote, but handles unicode variables correctly, such that this code would work:
decode_url(encode_url(u'Cataño')) == u'Cataño'
Any recommendations?
Python's urllib.quote and urllib.unquote do not handle Unicode correctly
urllib
does not handle Unicode at all. URLs don't contain non-ASCII characters, by definition. When you're dealing with urllib
you should use only byte strings. If you want those to represent Unicode characters you will have to encode and decode them manually.
IRIs can contain non-ASCII characters, encoding them as UTF-8 sequences, but Python doesn't, at this point, have an irilib
.
Encoding the value to UTF8 also does not work:
In [6]: print urllib.unquote(urllib.quote(u'Cataño'.encode('utf8')))
Cataño
Ah, well now you're typing Unicode into a console, and doing print
-Unicode to the console. This is generally unreliable, especially in Windows and in your case with the IPython console.
Type it out the long way with backslash sequences and you can more easily see that the urllib
bit does actually work:
>>> u'Cata\u00F1o'.encode('utf-8')
'Cata\xC3\xB1o'
>>> urllib.quote(_)
'Cata%C3%B1o'
>>> urllib.unquote(_)
'Cata\xC3\xB1o'
>>> _.decode('utf-8')
u'Cata\xF1o'
这篇关于是否有可用于 urllib.quote 和 urllib.unquote 在 Python 2.6.5 中的 unicode-ready 替代品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!