正则表达式来带状线从C#评论 [英] Regex to strip line comments from C#

查看:136
本文介绍了正则表达式来带状线从C#评论的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作的例行脱衣块的的行从一些C#code评论。我已经看过了网站上的其他例子,但还没有发现的确切的答案我要找的。

我可以用这个普通的前pression与RegexOptions.Singleline匹配块注释(/ *注释* /)的全部:

(/ \ * [\ W \ W] * \ * /)

和我可以用这个普通的前pression与RegexOptions.Multiline匹配行注释(//注释)的全部:

(//((?!\ * /)。)*)(?!\ * /)[^ \ r \ n]的

注:我使用 [^ \ r \ n]的而不是 $ ,因为 $ 是包括 \ r 在比赛中了。

不过,这并不相当的工作方式我想它。

下面是我的测试code说我对匹配:

  //删除整行注释
布尔破= FALSE; //删除部分行注释
如果(破==真)
{
    返回破;
}
/ *删除块注释
其他
{
    返回固定;
} //不要删除嵌套的注释* /布尔工作=碎了!;
返回无可奉告;
 

块EX pression比赛

  / *删除块注释
其他
{
    返回固定;
} //不要删除嵌套的注释* /
 

这是罚款和良好,但行前pression比赛

  //删除整行注释
//删除部分行注释
 

  //不要删除嵌套评论
 

另外,如果我没有在该行前pression的* /正向前查找两次,它匹配

  //不要删除嵌套评论*
 

我的真正的不想要的。

我要的是一个前pression,将匹配的字符,从 // ,到行的末尾,但确实的不是包含 * / // 和线路的终点。

此外,只是为了满足我的好奇心,任何人都可以解释为什么我需要超前两次? (//((?!\ * /)。)*)[^ \ r \ n]的(//()*) (?!\ * /)[^ \ r \ n]的都将包括*,但(//((?!\ * /)。)*)( ?!\ * /)[^ \ r \ n]的(//((?!\ * /)。)*(?!\ * /)) ^ \ r \ n]的不会。

解决方案

您的两个常规的前pressions(块和行注释)有缺陷。如果你想,我可以描述的错误,但我觉得它可能是更有效率,如果我写新的,特别是因为我打算写一个一个符合两个。

关键是,每次有 / * 和时间 // 和文字字符串干扰,每个另外,它始终是一个先启动这需要precedence。这是非常方便的,因为这正是如何经常​​EX pressions工作:首先找到的第一个匹配

让我们定义一个普通的前pression符合每个这四个标记:

  VAR blockComments = @/\*(.*?)\*/;
(。*?)// \ r \ N'变种lineComments = @;
VAR字符串= @((\\ [^ \ N] | [^\ n])的*),;
VAR verbatimStrings = @@([^] *)+;
 

要回答标题(条评论)的问题,我们需要:

  • 替换块注释没事
  • 替换行注释以换行符(因为正则表达式吃行)
  • 在保持文字串他们在哪里。

Regex.Replace 可以很容易地使用MatchEvaluator功能做到这一点:

 字符串noComments = Regex.Replace(输入,
    blockComments +| + lineComments +| +字符串+| + verbatimStrings,
    我=> {
        如果(me.Value.StartsWith(/ *)|| me.Value.StartsWith(//))
            返回me.Value.StartsWith(//)? Environment.NewLine:;
        //保留的文字串
        返回me.Value;
    },
    RegexOptions.Singleline);
 

我跑了Holystream提供的各种其他情况下,我能想到的所有示例和code,它就像一个魅力。如果您可以提供它失败的例子,我很高兴地调整code你。

I'm working on a routine to strip block or line comments from some C# code. I have looked at the other examples on the site, but haven't found the exact answer that I'm looking for.

I can match block comments (/* comment */) in their entirety using this regular expression with RegexOptions.Singleline:

(/\*[\w\W]*\*/)

And I can match line comments (// comment) in their entirety using this regular expression with RegexOptions.Multiline:

(//((?!\*/).)*)(?!\*/)[^\r\n]

Note: I'm using [^\r\n] instead of $ because $ is including \r in the match, too.

However, this doesn't quite work the way I want it to.

Here is my test code that I'm matching against:

// remove whole line comments
bool broken = false; // remove partial line comments
if (broken == true)
{
    return "BROKEN";
}
/* remove block comments
else
{
    return "FIXED";
} // do not remove nested comments */ bool working = !broken;
return "NO COMMENT";

The block expression matches

/* remove block comments
else
{
    return "FIXED";
} // do not remove nested comments */

which is fine and good, but the line expression matches

// remove whole line comments
// remove partial line comments

and

// do not remove nested comments

Also, if I do not have the */ positive lookahead in the line expression twice, it matches

// do not remove nested comments *

which I really don't want.

What I want is an expression that will match characters, starting with //, to the end of line, but does not contain */ between the // and end of line.

Also, just to satisfy my curiosity, can anyone explain why I need the lookahead twice? (//((?!\*/).)*)[^\r\n] and (//(.)*)(?!\*/)[^\r\n] will both include the *, but (//((?!\*/).)*)(?!\*/)[^\r\n] and (//((?!\*/).)*(?!\*/))[^\r\n] won't.

解决方案

Both of your regular expressions (for block and line comments) have bugs. If you want I can describe the bugs, but I felt it’s perhaps more productive if I write new ones, especially because I’m intending to write a single one that matches both.

The thing is, every time you have /* and // and literal strings "interfering" with each other, it is always the one that starts first that takes precedence. That’s very convenient because that’s exactly how regular expressions work: find the first match first.

So let’s define a regular expression that matches each of those four tokens:

var blockComments = @"/\*(.*?)\*/";
var lineComments = @"//(.*?)\r?\n";
var strings = @"""((\\[^\n]|[^""\n])*)""";
var verbatimStrings = @"@(""[^""]*"")+";

To answer the question in the title (strip comments), we need to:

  • Replace the block comments with nothing
  • Replace the line comments with a newline (because the regex eats the newline)
  • Keep the literal strings where they are.

Regex.Replace can do this easily using a MatchEvaluator function:

string noComments = Regex.Replace(input,
    blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings,
    me => {
        if (me.Value.StartsWith("/*") || me.Value.StartsWith("//"))
            return me.Value.StartsWith("//") ? Environment.NewLine : "";
        // Keep the literal strings
        return me.Value;
    },
    RegexOptions.Singleline);

I ran this code on all the examples that Holystream provided and various other cases that I could think of, and it works like a charm. If you can provide an example where it fails, I am happy to adjust the code for you.

这篇关于正则表达式来带状线从C#评论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆