如何在python中取消引用一个urlencoded unicode字符串? [英] How to unquote a urlencoded unicode string in python?

查看:28
本文介绍了如何在python中取消引用一个urlencoded unicode字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像Tanım"这样的 unicode 字符串,它以某种方式被编码为Tan%u0131m".如何将此编码字符串转换回原始 unicode.显然 urllib.unquote 不支持 unicode.

解决方案

%uXXXX 是一个 非标准编码方案 已被 w3c 拒绝,尽管在 JavaScript 领域仍有实现.

更常见的技术似乎是对字符串进行 UTF-8 编码,然后使用 %XX 对结果字节进行 % 转义.该方案由 urllib.unquote 支持:

<预><代码>>>>urllib2.unquote("%0a")' '

不幸的是,如果您确实需要支持 %uXXXX,则您可能需要自行安装解码器.否则,简单地对您的 unicode 进行 UTF-8 编码,然后对结果字节进行 % 转义可能要好得多.

一个更完整的例子:

<预><代码>>>>u"Tanim"u'Tanu0131m'>>>url = urllib.quote(u"Tanım".encode('utf8'))>>>urllib.unquote(url).decode('utf8')u'Tanu0131m'

I have a unicode string like "Tanım" which is encoded as "Tan%u0131m" somehow. How can i convert this encoded string back to original unicode. Apparently urllib.unquote does not support unicode.

解决方案

%uXXXX is a non-standard encoding scheme that has been rejected by the w3c, despite the fact that an implementation continues to live on in JavaScript land.

The more common technique seems to be to UTF-8 encode the string and then % escape the resulting bytes using %XX. This scheme is supported by urllib.unquote:

>>> urllib2.unquote("%0a")
'
'

Unfortunately, if you really need to support %uXXXX, you will probably have to roll your own decoder. Otherwise, it is likely to be far more preferable to simply UTF-8 encode your unicode and then % escape the resulting bytes.

A more complete example:

>>> u"Tanım"
u'Tanu0131m'
>>> url = urllib.quote(u"Tanım".encode('utf8'))
>>> urllib.unquote(url).decode('utf8')
u'Tanu0131m'

这篇关于如何在python中取消引用一个urlencoded unicode字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆