(php)regexto删除注释,但忽略字符串中的事件 [英] (php) regexto remove comments but ignore occurances within strings
问题描述
我正在写一个评论剥夺器,并试图适应所有的需求。我有以下堆栈的代码,删除几乎所有的评论,但它实际上走得太远。很多时间花在尝试和测试和研究正则表达式模式匹配,但我不声称他们是最好的每一个。
I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.
我的问题是,我也有这样的情况,我有'PHP评论'(这不是真正意见的标准代码,或甚至在PHP字符串,我实际上不想删除。
My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.
例如:
<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>
最后发生的事情是很好,但它留下了一些问题:
What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:
<?php $Var = "Blah blah ?>
另外:
也会导致问题,因为注释删除了其余的行,包括结尾?>
will also cause problems, as the comment removes the rest of the line, including the ending ?>
看到问题?所以这是我需要...
See the problem? So this is what I need...
- 需要忽略或中的注释字符
- PHP对同一行的注释使用双斜杠,应该删除可能只有注释本身,或应删除整个PHP的代码块。
现在,随时告诉我,如果有改善我可以在我现有的模式? :)
Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)
$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments
您可以给我的任何帮助将非常感谢! :)
Any help that you can give me would be greatly appreciated! :)
推荐答案
如果要解析PHP,可以使用 token_get_all
以获取令牌。
If you want to parse PHP, you can use token_get_all
to get the tokens of a given PHP code. Then you just need to iterate the tokens, remove the comment tokens and put the rest back together.
但是你需要一个单独的HTML注释过程,最好是一个真正的解析器(像 DOMDocument 提供的 DOMDocument :: loadHTML
)。
But you would need a separate procedure for the HTML comments, preferably a real parser too (like DOMDocument provides with DOMDocument::loadHTML
).
这篇关于(php)regexto删除注释,但忽略字符串中的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!