HTML Tidy在JavaScript字符串文字中的脚本标记上失败 [英] HTML Tidy fails on script tag in JavaScript string literal

查看:71
本文介绍了HTML Tidy在JavaScript字符串文字中的脚本标记上失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PHP中使用HTML Tidy,并且由于JavaScript字符串文字中的< script> 标签而产生了意想不到的结果。以下是一个示例输入:

 < html> 
< script>
var t ='< script><'+'/ script>';
< / script>
< / html>

HTML Tidy的输出:

 < HTML> 
< script>
//<![CDATA [
var t ='< script><'+'/ script>';
< \ / script>
< \ / html>
//]]>
< / script>
< / html>



它解释< /脚本>< / HTML> 作为脚本的一部分。然后,它会添加另一个< / script>< / html> 来关闭打开的标签。我在HTML Tidy的一个在线版本()上试过这个,它是生成相同的错误。

如何防止在PHP中发生此错误?

在玩了一段时间之后,我发现可以使用注释 //'< \ / script>'来混淆算法防止此错误发生的方法:

 < html> 
< script>
var t ='< script><'+'/ script>'; //'< \ / script>'
< / script>
< / html>

清理完成后:

 <!DOCTYPE html PUBLIC -  // W3C // DTD HTML 3.2 // EN> 

< html>
< head>

< script>
var t ='< script><'+'/ script>'; //'< \ / script>'
< / script>

< title>< / title>
< / head>

< body>
< / body>
< / html>

我的猜测是,由于清理算法会查看代码并检测字符串< script> 两次,它立即寻找< / script> 。和separting < /脚本> 使第二< /脚本> 未被发现,这就是为什么它决定添加另一< /脚本> 在代码和以某种方式的端部还与antoher关闭它< / HTML> 。 (可怜的设计确实!)

所以我做了第二个假设,在算法中没有if语句来确定< ; / scirpt> 在评论中,我是对的!将另一个字符串< \ / script> 作为javascript注释,确实使算法认为有两个< / script> 共计。


I'm using HTML Tidy in PHP and it's producing unexpected results because of a <script> tag in a JavaScript string literal. Here's a sample input:

<html>
<script>
var t='<script><'+'/script>';
</script>
</html>

HTML Tidy's output:

<html>
<script>
//<![CDATA[
var t='<script><'+'/script>';
<\/script>
<\/html>
//]]>
</script>
</html>

It's interpreting </script></html> as part of the script. Then, it adds another </script></html> to close the open tags. I tried this on an online version of HTML Tidy (http://www.dirtymarkup.com/) and it's producing the same error.

How do I prevent this error from occurring in PHP?

解决方案

After playing around with it a bit I discovered that one can use comment //'<\/script>' to confuse the algorithm in a way to prevent this bug from occurring:

<html>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
</html>

After clean-up:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">

<html>
<head>

   <script>
var t='<script><'+'/script>'; //'<\/script>'
   </script>

   <title></title>
</head>

<body>
</body>
</html>

My guess is that as the clean-up algorithm looks through the codes and detects the string <script> twice, it looks for </script> immediately. And separting < with /script> makes the second </script> goes undetected, which is why it decided to add another </script> at the end of the codes and somehow also closed it with antoher </html>. (Poor design indeed!)

So I made a second assumption that there isn't an if-statement in the algorithm to determine if a </scirpt> is in a comment, and I was right! Having another string <\/script> as a javascript comment indeed makes the algorithm to think that there are two </script> in total.

这篇关于HTML Tidy在JavaScript字符串文字中的脚本标记上失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆