正则表达式来带状线从C＃评论 [英] Regex to strip line comments from C#

查看：136 发布时间：2015/11/24 11:46:39 c# .net regex

本文介绍了正则表达式来带状线从C＃评论的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我工作的例行脱衣块的或的行从一些C＃code评论。我已经看过了网站上的其他例子，但还没有发现的确切的答案我要找的。

我可以用这个普通的前pression与RegexOptions.Singleline匹配块注释（/ *注释* /）的全部：

（/ \ * [\ W \ W] * \ * /）

和我可以用这个普通的前pression与RegexOptions.Multiline匹配行注释（//注释）的全部：

（//（（？！\ * /）。）*）（？！\ * /）[^ \ r \ n]的

注：我使用 [^ \ r \ n]的而不是 $ ，因为 $ 是包括 \ r 在比赛中了。的

不过，这并不相当的工作方式我想它。

下面是我的测试code说我对匹配：

  //删除整行注释
布尔破= FALSE; //删除部分行注释
如果（破==真）
{
    返回破;
}
/ *删除块注释
其他
{
    返回固定;
} //不要删除嵌套的注释* /布尔工作=碎了！;
返回无可奉告;

块EX pression比赛

  / *删除块注释
其他
{
    返回固定;
} //不要删除嵌套的注释* /

这是罚款和良好，但行前pression比赛

  //删除整行注释
//删除部分行注释

和的

  //不要删除嵌套评论

另外，如果我没有在该行前pression的* /正向前查找两次，它匹配

  //不要删除嵌套评论*

我的真正的不想要的。

我要的是一个前pression，将匹配的字符，从 // ，到行的末尾，但确实的不是包含 * / 的 // 和线路的终点。

此外，只是为了满足我的好奇心，任何人都可以解释为什么我需要超前两次？ （//（（？！\ * /）。）*）[^ \ r \ n]的和（//（）*）（？！\ * /）[^ \ r \ n]的都将包括*，但（//（（？！\ * /）。）*）（？！\ * /）[^ \ r \ n]的和（//（（？！\ * /）。）*（？！\ * /）） ^ \ r \ n]的不会。

解决方案

您的两个常规的前pressions（块和行注释）有缺陷。如果你想，我可以描述的错误，但我觉得它可能是更有效率，如果我写新的，特别是因为我打算写一个一个符合两个。

关键是，每次有 / * 和时间 // 和文字字符串干扰，每个另外，它始终是一个先启动这需要precedence。这是非常方便的，因为这正是如何经常EX pressions工作：首先找到的第一个匹配

让我们定义一个普通的前pression符合每个这四个标记：

  VAR blockComments = @/\*(.*?)\*/;
（。*？）// \ r \ N'变种lineComments = @;
VAR字符串= @（（\\ [^ \ N] | [^\ n]）的*），;
VAR verbatimStrings = @@（[^] *）+;

要回答标题（条评论）的问题，我们需要：

替换块注释没事
替换行注释以换行符（因为正则表达式吃行）
在保持文字串他们在哪里。

Regex.Replace 可以很容易地使用MatchEvaluator功能做到这一点：

 字符串noComments = Regex.Replace（输入，
    blockComments +| + lineComments +| +字符串+| + verbatimStrings，
    我=＆GT; {
        如果（me.Value.StartsWith（/ *）|| me.Value.StartsWith（//））
            返回me.Value.StartsWith（//）？ Environment.NewLine：;
        //保留的文字串
        返回me.Value;
    }，
    RegexOptions.Singleline）;

我跑了Holystream提供的各种其他情况下，我能想到的所有示例和code，它就像一个魅力。如果您可以提供它失败的例子，我很高兴地调整code你。

I'm working on a routine to strip block or line comments from some C# code. I have looked at the other examples on the site, but haven't found the exact answer that I'm looking for.

I can match block comments (/* comment */) in their entirety using this regular expression with RegexOptions.Singleline:

(/\*[\w\W]*\*/)

And I can match line comments (// comment) in their entirety using this regular expression with RegexOptions.Multiline:

(//((?!\*/).)*)(?!\*/)[^\r\n]

Note: I'm using [^\r\n] instead of $ because $ is including \r in the match, too.

However, this doesn't quite work the way I want it to.

Here is my test code that I'm matching against:

// remove whole line comments
bool broken = false; // remove partial line comments
if (broken == true)
{
    return "BROKEN";
}
/* remove block comments
else
{
    return "FIXED";
} // do not remove nested comments */ bool working = !broken;
return "NO COMMENT";

The block expression matches

/* remove block comments
else
{
    return "FIXED";
} // do not remove nested comments */

which is fine and good, but the line expression matches

// remove whole line comments
// remove partial line comments

and

// do not remove nested comments

Also, if I do not have the */ positive lookahead in the line expression twice, it matches

// do not remove nested comments *

which I really don't want.

What I want is an expression that will match characters, starting with //, to the end of line, but does not contain */ between the // and end of line.

Also, just to satisfy my curiosity, can anyone explain why I need the lookahead twice? (//((?!\*/).)*)[^\r\n] and (//(.)*)(?!\*/)[^\r\n] will both include the *, but (//((?!\*/).)*)(?!\*/)[^\r\n] and (//((?!\*/).)*(?!\*/))[^\r\n] won't.

解决方案

Both of your regular expressions (for block and line comments) have bugs. If you want I can describe the bugs, but I felt it’s perhaps more productive if I write new ones, especially because I’m intending to write a single one that matches both.

The thing is, every time you have /* and // and literal strings "interfering" with each other, it is always the one that starts first that takes precedence. That’s very convenient because that’s exactly how regular expressions work: find the first match first.

So let’s define a regular expression that matches each of those four tokens:

var blockComments = @"/\*(.*?)\*/";
var lineComments = @"//(.*?)\r?\n";
var strings = @"""((\\[^\n]|[^""\n])*)""";
var verbatimStrings = @"@(""[^""]*"")+";

To answer the question in the title (strip comments), we need to:

Replace the block comments with nothing
Replace the line comments with a newline (because the regex eats the newline)
Keep the literal strings where they are.

Regex.Replace can do this easily using a MatchEvaluator function:

string noComments = Regex.Replace(input,
    blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings,
    me => {
        if (me.Value.StartsWith("/*") || me.Value.StartsWith("//"))
            return me.Value.StartsWith("//") ? Environment.NewLine : "";
        // Keep the literal strings
        return me.Value;
    },
    RegexOptions.Singleline);

I ran this code on all the examples that Holystream provided and various other cases that I could think of, and it works like a charm. If you can provide an example where it fails, I am happy to adjust the code for you.

这篇关于正则表达式来带状线从C＃评论的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式来带状线从C＃评论 [英] Regex to strip line comments from C#

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

正则表达式来带状线从C＃评论 [英] Regex to strip line comments from C#

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭