删除PHP中的所有REAL Javascript注释 [英] Remove all REAL Javascript comments in PHP

查看:63
本文介绍了删除PHP中的所有REAL Javascript注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种使用PHP去除HTML代码中所有 javascript注释的解决方案.

I'm looking for a solution to strip all javascript comments in an HTML code using PHP.

我只想剥离 JavaScript注释(而不是HTML注释,等等).我认为正则表达式不是解决方案,因为它无法理解是真实注释还是字符串的一部分.示例:

I want to strip only Javascript comments (not HTML comments and so on). I think that a regex is not a solution because it cannot understand if is a real comment or a part of a string. Example:

<script>

// This is a comment
/* This is another comment */

// The following is not a comment
var src="//google.com"; 

</script>

有办法吗?提前非常感谢

There is a way to do it? Many thanks in advance

推荐答案

要做的第一件事:您需要提取脚本标签的内容.为此,请使用DOMDocument:

First thing to do: you need to extract the content of script tags. For that, use DOMDocument:

$dom = new DOMDocument;
$dom->loadHTML($html);

$scriptNodes = $dom->getElementsByTagName('script');

第二步包括删除每个脚本节点的所有javascript注释.

The second step consists to remove all the javascript comments for each script node.

您可以根据需要使用第三方JavaScript解析器,但也可以使用正则表达式来实现.您所需要做的就是防止考虑报价之间的部分.

You can use a third party javascript parser if you want but you can do that with a regex too. All you need is to prevent parts between quotes to be taken in account.

为此,您必须搜索引号之间的第一部分并将其丢弃.用javascript做到这一点的唯一困难是引号可以放在正则表达式模式内,例如:
/pattern带引号/

To do that you must search first parts between quotes and discards them. The only difficulty to do that with javascript is that a quote can be inside a regex pattern, example:
/pattern " with a quote/

因此,您还需要查找模式以防止出现任何错误.

So you need to find patterns to prevent any error too.

模式示例:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<squoted> ' [^'\n\\]*+ (?: \\. [^'\n\\]* )*+ ' )
    (?<dquoted> " [^"\n\\]*+ (?: \\. [^"\n\\]* )*+ " )
    (?<quoted>  \g<squoted> | \g<dquoted> )

    (?<scomment> // \N* )
    (?<mcomment> /\* [^*]*+ (?: \*+ (?!/) [^*]* )*+ \*/ )
    (?<comment> \g<scomment> | \g<mcomment> )

    (?<pattern> / [^\n/*] [^\n/\\]*+ (?>\\.[^\n/\\]*)* / [gimuy]* ) 
)

(?=[[(:,=/"'])
(?|
    \g<quoted> (*SKIP)(*FAIL)
  |
    ( [[(:,=] \s* ) (*SKIP) (?: \g<comment> \s* )*+ ( \g<pattern> )
  | 
    ( \g<pattern> \s* ) (?: \g<comment> \s* )*+ 
    ( \. \s* ) (?:\g<comment> \s* )*+ ([A-Za-z_]\w*)
  |
    \g<comment>
)
~x
EOD;

然后替换每个脚本节点的内容:

Then you replace the content of each script nodes:

foreach ($scriptNodes as $scriptNode) {
    $scriptNode->nodeValue = preg_replace($pattern, '$8$9${10}', $scriptNode->nodeValue);
}

$html = $dom->saveHTML();

演示

模式详细信息:

((?DEFINE)...)是一个区域,您可以在其中放置以后需要的所有子模式定义.真实"模式从此开始.

((?DEFINE)...) is an area where you can put all subpattern definitions you will need later. The "real" pattern begins after.

(?< name> ...)被命名为子模式.它与捕获组相同,除了可以使用其名称(例如此 \ g< name> )而不是其编号来引用它.

(?<name>...) are named subpatterns. It's the same than a capture group except that you can refer to it with its name (like this \g<name>) instead of its number.

* + 所有量词

\ N 表示不是换行符的字符

\N means a character that is not a newline

(?= [[(:,=/']] 如果角色不同,将其删除,该模式将起作用,只是为了快速跳过无用的演出位置.

(?=[[(:,=/"']) is a lookahead that checks if the next character is one of these [ ( : , = / " '. The goal of this test is to prevent to test each branch of the following alternation if the character is different. If you remove it, the pattern will work the same, it's only to quickly skip useless positions for performances.

(* SKIP)是回溯控制动词.如果模式在其之后失败,则将不尝试所有在其之前匹配的位置.

(*SKIP) is a backtracking control verb. When the pattern fails after it, all positions matched before it would not be tried.

(* FAIL)也是回溯控制动词,它会强制模式失败.

(*FAIL) is a backtracking control verb too and forces the pattern to fail.

(?| ..(..)..(..).. | ..(..)..(..)..)是分支重置组.在其中,捕获组在每个分支中分别具有相同的编号(此模式为8和9).

(?|..(..)..(..)..|..(..)..(..)..) is a branch-reset group. Inside it, the capture groups have respectively the same numbers (8 and 9 for this pattern) in each branch.

这篇关于删除PHP中的所有REAL Javascript注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆