使用PHP剥离HTML注释,但离开条件 [英] Stripping HTML Comments With PHP But Leaving Conditionals

查看:107
本文介绍了使用PHP剥离HTML注释,但离开条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前使用PHP和一个正则表达式来从一个页面中去掉所有的HTML注释。脚本运行良好...有点太好了。它删除所有评论,包括我的条件评论。这是我得到的:

 <?php 
函数回调($ buffer)
{
return preg_replace('/<! - (。| \s)*? - > /','',$ buffer);
}

ob_start(callback);
?>
... HTML源代码在这里...
<?php ob_end_flush(); ?>

由于我的正则表达式不太热,我在尝试找出如何修改模式以排除条件注释,例如:

 <! -  [if!IE]><! - ; 
< link rel =stylesheethref =/ css / screen.csstype =text / cssmedia =screen/>
<! - <![endif] - >

<! - [if IE 7]>
< link rel =stylesheethref =/ css / ie7.csstype =text / cssmedia =screen/>
<![endif] - >

<! - [if IE 6]>
< link rel =stylesheethref =/ css / ie6.csstype =text / cssmedia =screen/>
<![endif] - >

干杯

解决方案

由于注释不能嵌套在HTML中,正则表达式可以在理论上完成这项工作。仍然,使用某种解析器将是更好的选择,特别是如果你的输入不能保证形成良好。



这里是我的尝试。要只匹配正常的注释,这将工作。它已经变成一个怪物,对不起。我已经测试了它相当广泛,它似乎做得很好,但我不给保修。

 <! - (?! \s *(?: \ [if [^ \]] +] |<!|>)) *  - > 

说明:

 <! - #01:<! - 
(?!#02:look-ahead: b \ * *#03:任意数量的空格
(?:#04:非捕获组,任何:
\ [if [^ \]] +]#05: [if ...]
|< ;!#06:或< ;!
|>#07:或>
)#08:捕获组
)#09:结束先行
(?:#10:非捕获组:
(?!->)#11: - >
。#12:吃下面的char,它是评论的一部分
)*#13:结束非捕获组,重复
- > #14: - >

步骤#02和#11至关重要。 #02确保以下字符不表示条件注释。之后,#11确保以下字符不表示注释的结束,而#12和#13则引起实际匹配。



使用全局和dotall标志。



要做相反的事情(只匹配条件注释),它会是这样:

 <!( - )?(?= \ [ \\] \1>)。)*<!\ [endif\] \1> 

说明:

 < ;! #01:< ;! 
( - )? #02:两个破折号,可选
(?= \ [)#03:一个位置后跟[
(?:#04:非捕获组:
#05:不后跟
的位置<!\ [endif\] \1>#06:<![endif]>或<! ;(取决于#02)
)#07:end of look-ahead
。#08:吃下面的char,它是注释的一部分
)*#09:end非捕获组,重复
<!\ [endif\] \1> #10:<![endif]>或<![endif] - > (取决于#02)

同样,使用global p>

步骤#02是因为downlevel-revealed语法,参见:MSDN - 关于条件注释



允许或期望空格。在适当的情况下,将 \s * 添加到表达式中。


I'm currently using PHP and a regular expression to strip out all HTML comments from a page. The script works well... a little too well. It strips out all comments including my conditional comments in the . Here's what I've got:

<?php
  function callback($buffer)
  {
        return preg_replace('/<!--(.|\s)*?-->/', '', $buffer);
  }

  ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>

Since my regex isn't too hot I'm having trouble trying to figure out how to modify the pattern to exclude Conditional comments such as:

<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->

<!--[if IE 7]>
<link rel="stylesheet" href="/css/ie7.css" type="text/css" media="screen" />
<![endif]-->

<!--[if IE 6]>
<link rel="stylesheet" href="/css/ie6.css" type="text/css" media="screen" />
<![endif]-->

Cheers

解决方案

Since comments cannot be nested in HTML, a regex can do the job, in theory. Still, using some kind of parser would be the better choice, especially if your input is not guaranteed to be well-formed.

Here is my attempt at it. To match only normal comments, this would work. It has become quite a monster, sorry for that. I have tested it quite extensively, it seems to do it well, but I give no warranty.

<!--(?!\s*(?:\[if [^\]]+]|<!|>))(?:(?!-->).)*-->

Explanation:

<!--                #01: "<!--"
(?!                 #02: look-ahead: a position not followed by:
  \s*               #03:   any number of space
  (?:               #04:   non-capturing group, any of:
    \[if [^\]]+]    #05:     "[if ...]"
    |<!             #06:     or "<!"
    |>              #07:     or ">"
  )                 #08:   end non-capturing group
)                   #09: end look-ahead
(?:                 #10: non-capturing group:
  (?!-->)           #11:   a position not followed by "-->"
  .                 #12:   eat the following char, it's part of the comment
)*                  #13: end non-capturing group, repeat
-->                 #14: "-->"

Steps #02 and #11 are crucial. #02 makes sure that the following characters do not indicate a conditional comment. After that, #11 makes sure that the following characters do not indicate the end of the comment, while #12 and #13 cause the actual matching.

Apply with "global" and "dotall" flags.

To do the opposite (match only conditional comments), it would be something like this:

<!(--)?(?=\[)(?:(?!<!\[endif\]\1>).)*<!\[endif\]\1>

Explanation:

<!                  #01: "<!"
(--)?               #02: two dashes, optional
(?=\[)              #03: a position followed by "["
(?:                 #04: non-capturing group:
  (?!               #05:   a position not followed by
    <!\[endif\]\1>  #06:     "<![endif]>" or "<![endif]-->" (depends on #02)
  )                 #07:   end of look-ahead
  .                 #08:   eat the following char, it's part of the comment
)*                  #09: end of non-capturing group, repeat
<!\[endif\]\1>      #10: "<![endif]>" or "<![endif]-->" (depends on #02)

Again, apply with "global" and "dotall" flags.

Step #02 is because of the "downlevel-revealed" syntax, see: "MSDN - About Conditional Comments".

I'm not entirely sure where spaces are allowed or expected. Add \s* to the expression where appropriate.

这篇关于使用PHP剥离HTML注释,但离开条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆