(php)regexto删除注释,但忽略字符串中的事件 [英] (php) regexto remove comments but ignore occurances within strings

查看:110
本文介绍了(php)regexto删除注释,但忽略字符串中的事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个评论剥夺器,并试图适应所有的需求。我有以下堆栈的代码,删除几乎所有的评论,但它实际上走得太远。很多时间花在尝试和测试和研究正则表达式模式匹配,但我不声称他们是最好的每一个。

I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.

我的问题是,我也有这样的情况,我有'PHP评论'(这不是真正意见的标准代码,或甚至在PHP字符串,我实际上不想删除。

My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.

例如:

<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>

最后发生的事情是很好,但它留下了一些问题:

What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:

<?php  $Var = "Blah blah  ?>

另外:

也会导致问题,因为注释删除了其余的行,包括结尾?>

will also cause problems, as the comment removes the rest of the line, including the ending ?>

看到问题?所以这是我需要...

See the problem? So this is what I need...


  • 需要忽略或中的注释字符

  • PHP对同一行的注释使用双斜杠,应该删除可能只有注释本身,或应删除整个PHP的代码块。

现在,随时告诉我,如果有改善我可以在我现有的模式? :)

Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)

$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData);  // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments

您可以给我的任何帮助将非常感谢! :)

Any help that you can give me would be greatly appreciated! :)

推荐答案

如果要解析PHP,可以使用 token_get_all 以获取令牌

If you want to parse PHP, you can use token_get_all to get the tokens of a given PHP code. Then you just need to iterate the tokens, remove the comment tokens and put the rest back together.

但是你需要一个单独的HTML注释过程,最好是一个真正的解析器(像 DOMDocument 提供的 DOMDocument :: loadHTML )。

But you would need a separate procedure for the HTML comments, preferably a real parser too (like DOMDocument provides with DOMDocument::loadHTML).

这篇关于(php)regexto删除注释,但忽略字符串中的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆