如何在python中取消对urlencoded unicode字符串的引用？ [英] How to unquote a urlencoded unicode string in python?

查看：101 发布时间：2020/10/1 0:17:40 python unicode character-encoding urllib w3c

本文介绍了如何在python中取消对urlencoded unicode字符串的引用？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似Tanım的Unicode字符串，以某种方式编码为 Tan％u0131m。我如何将这个编码后的字符串转换回原始的unicode。
显然urllib.unquote不支持Unicode。

I have a unicode string like "Tanım" which is encoded as "Tan%u0131m" somehow. How can i convert this encoded string back to original unicode. Apparently urllib.unquote does not support unicode.

推荐答案

％uXXXX是非标准编码方案，尽管实施仍继续存在，但已被w3c拒绝

%uXXXX is a non-standard encoding scheme that has been rejected by the w3c, despite the fact that an implementation continues to live on in JavaScript land.

更常见的技术似乎是对字符串进行UTF-8编码，然后使用％XX％转义结果字节。 urllib.unquote支持此方案：

The more common technique seems to be to UTF-8 encode the string and then % escape the resulting bytes using %XX. This scheme is supported by urllib.unquote:

>>> urllib2.unquote("%0a")
'\n'

如果您确实需要以支持％uXXXX，您可能必须推出自己的解码器。否则，简单地用UTF-8编码您的unicode，然后％转义所得到的字节，可能会更可取。

Unfortunately, if you really need to support %uXXXX, you will probably have to roll your own decoder. Otherwise, it is likely to be far more preferable to simply UTF-8 encode your unicode and then % escape the resulting bytes.

更完整的示例：

>>> u"Tanım"
u'Tan\u0131m'
>>> url = urllib.quote(u"Tanım".encode('utf8'))
>>> urllib.unquote(url).decode('utf8')
u'Tan\u0131m'

这篇关于如何在python中取消对urlencoded unicode字符串的引用？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在python中取消对urlencoded unicode字符串的引用？ [英] How to unquote a urlencoded unicode string in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中取消对urlencoded unicode字符串的引用？ [英] How to unquote a urlencoded unicode string in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭