为什么使用\\ X3C代替<产生从JavaScript的HTML是什么时候? [英] Why use \x3C instead of < when generating HTML from JavaScript?

查看:2589
本文介绍了为什么使用\\ X3C代替<产生从JavaScript的HTML是什么时候?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到下面的HTML code使用了很多从内容交付网络加载了jQuery,但回落到一个本地副本,如果CDN不可用(例如,在 Modernizr的文档):

I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>

我的问题是,为什么是最后&LT; 字符与替换的document.write()语句转义序列 \\ X3C &LT; 是JavaScript的一个安全特性,并且即使在相同的字符串之前使用,那么,为什么逃避它呢?它只是从思想prevent坏浏览器实现的&LT; / SCRIPT&GT; 里面的字符串是真正的脚本结束标记?如果是的话是真的有什么浏览器,在那里,将无法在这?

My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?

作为一个后续的问题,我也看到使用变量 UNESCAPE()(如的在野外这个答案)几次了。是否有一个原因,该版本似乎总是替代的所有的的&LT; &GT; 字符?

As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

推荐答案

在浏览器看到&LT; / SCRIPT&GT; ,它认为这是对脚本的末尾块(因为HTML解析器没有关于JavaScript的想法,它不能只是出现在一个字符串东西,东西是的区分实际上意味着的结束脚本元素)。因此,&LT; / SCRIPT&GT; 在JavaScript中这是一个HTML页面将内部(在最好的情况下)引起的错误,以及(在最坏的情况下),从字面上出现是一个巨大的安全漏洞

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.

这就是为什么你无论如何都必须prevent这个序列的字符出现。针对此问题其他常见的解决方法是&LT;+/ SCRIPT&gt;中&LT; \\ / SCRIPT&gt;中(他们都归结为同样的事情)。

That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).

虽然有些人认为这是一个错误,它实际上具有的出现这种情况了,因为当按照的specification ,用户代理的HTML部分是从脚本引擎完全分开的。你可以把各种东西放到&LT;脚本&GT; 标签,而不是仅仅的JavaScript。 W3C的提到了VBScript和TCL为例。又如的jQuery插件模板,使用这些标签的。

While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.

不过,即使在JavaScript中,在那里你可以建议,在这样的字符串内容可以被识别,因此不会被视为结束标记,当你考虑的意见接下来的歧义出现:

But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:

<script type="text/javascript">foo(42); // call the function </script>

&ndash的;又该浏览器在这种情况下怎么办?

– what should the browser do in this case?

最后,怎么样的浏览器不知道的JavaScript?他们会忽略的部分&LT;脚本&GT; &LT; / SCRIPT&GT; ,但如果你给了不同的语义到&LT字符序列; / SCRIPT&GT; 基础上的的JavaScript的浏览器知识的,你会突然有两个不同的结果的 HTML解析阶段

And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.

最后,关于你的问题有关替代的所有的尖括号:我会说,至少在99%的情况,这对混淆,即隐藏(从防病毒软件,审查代理(如在你的榜样(嵌套的括号是真棒))等),你的JavaScript是做一些HTML-Y的东西的事实。我想不出很好的技术理由来隐瞒任何东西,但&LT; / SCRIPT&GT; ,至少对于比较现代的浏览器(和通过,我的意思是pretty任何东西比马赛克更新)。

Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

这篇关于为什么使用\\ X3C代替&LT;产生从JavaScript的HTML是什么时候?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆