Javascript unescape()vs. Python urllib.unquote() [英] Javascript unescape() vs. Python urllib.unquote()

查看:211
本文介绍了Javascript unescape()vs. Python urllib.unquote()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从阅读各种帖子,似乎JavaScript的 unescape()相当于Pythons urllib.unquote() ,但是当我测试时,我会得到不同的结果:

From reading various posts, it seems like JavaScript's unescape() is equivalent to Pythons urllib.unquote(), however when I test both I get different results:

unescape('%u003c%u0062%u0072%u003e');

输出: < br>

import urllib
urllib.unquote('%u003c%u0062%u0072%u003e')

输出: %u003c%u0062%u0072%u003e

我希望Python也能返回<峰; br> 。任何关于我在这里失踪的想法?

I would expect Python to also return <br>. Any ideas as to what I'm missing here?

谢谢!

推荐答案

%uxxxx 是一个 urllib.unquote()不支持的非标准URL编码方案

%uxxxx is a non standard URL encoding scheme that is not supported by urllib.unquote().

它只是ECMAScript ECMA-262第3版的一部分;该格式被W3C拒绝,从来不是RFC的一部分。

It was only ever part of ECMAScript ECMA-262 3rd edition; the format was rejected by the W3C and was never a part of an RFC.

您可以使用正则表达式转换这样的代码点:

You could use a regular expression to convert such codepoints:

re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)

这将解码%uxxxx %uxx 表单ECMAScript 3rd ed可以解码。

This decodes both the %uxxxx and the %uxx form ECMAScript 3rd ed can decode.

演示:

>>> import re
>>> quoted = '%u003c%u0062%u0072%u003e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)
u'<br>'
>>> altquoted = '%u3c%u0062%u0072%u3e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), altquoted)
u'<br>'

但你应该尽可能避免使用编码。

but you should avoid using the encoding altogether if possible.

这篇关于Javascript unescape()vs. Python urllib.unquote()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆