正则表达式来带状线从C#评论 [英] Regex to strip line comments from C#
问题描述
我工作的例行脱衣块的或的行从一些C#code评论。我已经看过了网站上的其他例子,但还没有发现的确切的答案我要找的。 P>
我可以用这个普通的前pression与RegexOptions.Singleline匹配块注释(/ *注释* /)的全部:
(/ \ * [\ W \ W] * \ * /)
和我可以用这个普通的前pression与RegexOptions.Multiline匹配行注释(//注释)的全部:
(//((?!\ * /)。)*)(?!\ * /)[^ \ r \ n]的
注:我使用 [^ \ r \ n]的
而不是 $
,因为 $
是包括 \ r
在比赛中了。的
不过,这并不相当的工作方式我想它。
下面是我的测试code说我对匹配:
//删除整行注释
布尔破= FALSE; //删除部分行注释
如果(破==真)
{
返回破;
}
/ *删除块注释
其他
{
返回固定;
} //不要删除嵌套的注释* /布尔工作=碎了!;
返回无可奉告;
块EX pression比赛
/ *删除块注释
其他
{
返回固定;
} //不要删除嵌套的注释* /
这是罚款和良好,但行前pression比赛
//删除整行注释
//删除部分行注释
和的
//不要删除嵌套评论
另外,如果我没有在该行前pression的* /正向前查找两次,它匹配
//不要删除嵌套评论*
我的真正的不想要的。
我要的是一个前pression,将匹配的字符,从 //
,到行的末尾,但确实的不是包含 * /
的 //
和线路的终点。
此外,只是为了满足我的好奇心,任何人都可以解释为什么我需要超前两次? (//((?!\ * /)。)*)[^ \ r \ n]的
和(//()*) (?!\ * /)[^ \ r \ n]的
都将包括*,但(//((?!\ * /)。)*)( ?!\ * /)[^ \ r \ n]的
和(//((?!\ * /)。)*(?!\ * /)) ^ \ r \ n]的
不会。
您的两个常规的前pressions(块和行注释)有缺陷。如果你想,我可以描述的错误,但我觉得它可能是更有效率,如果我写新的,特别是因为我打算写一个一个符合两个。
关键是,每次有 / *
和时间 //
和文字字符串干扰,每个另外,它始终是一个先启动这需要precedence。这是非常方便的,因为这正是如何经常EX pressions工作:首先找到的第一个匹配
让我们定义一个普通的前pression符合每个这四个标记:
VAR blockComments = @/\*(.*?)\*/;
(。*?)// \ r \ N'变种lineComments = @;
VAR字符串= @((\\ [^ \ N] | [^\ n])的*),;
VAR verbatimStrings = @@([^] *)+;
要回答标题(条评论)的问题,我们需要:
- 替换块注释没事
- 替换行注释以换行符(因为正则表达式吃行)
- 在保持文字串他们在哪里。
Regex.Replace
可以很容易地使用MatchEvaluator功能做到这一点:
字符串noComments = Regex.Replace(输入,
blockComments +| + lineComments +| +字符串+| + verbatimStrings,
我=> {
如果(me.Value.StartsWith(/ *)|| me.Value.StartsWith(//))
返回me.Value.StartsWith(//)? Environment.NewLine:;
//保留的文字串
返回me.Value;
},
RegexOptions.Singleline);
我跑了Holystream提供的各种其他情况下,我能想到的所有示例和code,它就像一个魅力。如果您可以提供它失败的例子,我很高兴地调整code你。
I'm working on a routine to strip block or line comments from some C# code. I have looked at the other examples on the site, but haven't found the exact answer that I'm looking for.
I can match block comments (/* comment */) in their entirety using this regular expression with RegexOptions.Singleline:
(/\*[\w\W]*\*/)
And I can match line comments (// comment) in their entirety using this regular expression with RegexOptions.Multiline:
(//((?!\*/).)*)(?!\*/)[^\r\n]
Note: I'm using [^\r\n]
instead of $
because $
is including \r
in the match, too.
However, this doesn't quite work the way I want it to.
Here is my test code that I'm matching against:
// remove whole line comments
bool broken = false; // remove partial line comments
if (broken == true)
{
return "BROKEN";
}
/* remove block comments
else
{
return "FIXED";
} // do not remove nested comments */ bool working = !broken;
return "NO COMMENT";
The block expression matches
/* remove block comments
else
{
return "FIXED";
} // do not remove nested comments */
which is fine and good, but the line expression matches
// remove whole line comments
// remove partial line comments
and
// do not remove nested comments
Also, if I do not have the */ positive lookahead in the line expression twice, it matches
// do not remove nested comments *
which I really don't want.
What I want is an expression that will match characters, starting with //
, to the end of line, but does not contain */
between the //
and end of line.
Also, just to satisfy my curiosity, can anyone explain why I need the lookahead twice? (//((?!\*/).)*)[^\r\n]
and (//(.)*)(?!\*/)[^\r\n]
will both include the *, but (//((?!\*/).)*)(?!\*/)[^\r\n]
and (//((?!\*/).)*(?!\*/))[^\r\n]
won't.
Both of your regular expressions (for block and line comments) have bugs. If you want I can describe the bugs, but I felt it’s perhaps more productive if I write new ones, especially because I’m intending to write a single one that matches both.
The thing is, every time you have /*
and //
and literal strings "interfering" with each other, it is always the one that starts first that takes precedence. That’s very convenient because that’s exactly how regular expressions work: find the first match first.
So let’s define a regular expression that matches each of those four tokens:
var blockComments = @"/\*(.*?)\*/";
var lineComments = @"//(.*?)\r?\n";
var strings = @"""((\\[^\n]|[^""\n])*)""";
var verbatimStrings = @"@(""[^""]*"")+";
To answer the question in the title (strip comments), we need to:
- Replace the block comments with nothing
- Replace the line comments with a newline (because the regex eats the newline)
- Keep the literal strings where they are.
Regex.Replace
can do this easily using a MatchEvaluator function:
string noComments = Regex.Replace(input,
blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings,
me => {
if (me.Value.StartsWith("/*") || me.Value.StartsWith("//"))
return me.Value.StartsWith("//") ? Environment.NewLine : "";
// Keep the literal strings
return me.Value;
},
RegexOptions.Singleline);
I ran this code on all the examples that Holystream provided and various other cases that I could think of, and it works like a charm. If you can provide an example where it fails, I am happy to adjust the code for you.
这篇关于正则表达式来带状线从C#评论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!