如何删除从code C风格的注释 [英] How to remove C-style comments from code

查看:240
本文介绍了如何删除从code C风格的注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚看了一个新的问题就在这里问SO基本上像我做的称号同样的事情。这让我思考 - 和Web搜索(点击率最高指着所以,当然)。因此,我认为 -

应该有能够从任何code删除C风格的注释的一个简单的正则表达式。

是的,有答案的SO这个问题/语句,但那些我发现,the're所有未完成和/或过于复杂。

于是我开始试验,并与一个对所有类型的code 作品来到了我可以想象:

 (:\\ / \\ /(?:?\\\\\\ N | [^ \\ n])* \\ n)|(?:?:\\ / \\ *(:\\ n | \\ r |)* \\ * \\ /)|((|)(:?\\\\\\\\ | \\\\\\ 2 | \\\\\\ N | [^ \\ 2])* \\ 2)?

有关的双斜杠 // 注释的第一个替代检查。第二为的普通的那些 / *注释* / 。第三个是什么,我很难找到其他regex'es处理同样的任务处理 - 包含字符串之外,会被认为是注释字符序列串

什么。这部分工作是捕获捕获组之一的任何字符串,在捕获组二,报价符号匹配报价的人,到字符串的结尾。

拍摄组应该保持在更换,抛弃一切(取代了)留下取消注释code:)

这里是在regex101一个C的例子。

OK ......所以这不是一个问题。这是你想一个答案......

是的,你说得对。所以......到这个问题。

我错过了任何类型的code的,这正则表达式将错过?

它处理

多行注释

  / *
    一个容易
* /

行结束意见

  //删除此

在字符串意见

 字符数组[] =以下是不评论//因为它是在一个字符串/ *这既不是* /;

这导致 - 字符串与转义引号

 字符数组[] =手柄/ *意见* /  -  //  - 与\\字符串越狱引号;

和逃脱逃脱串

 字符数组[] =处理字符串有** **不转义引号\\\\; //< -EOS

javscript单引号字符

  VAR myStr的='也应该忽略封闭式//注释/ *这样的* /';

续行

  //这是一个单行注释\\
在继续下一行(警告,但在我的C ++的味道的作品)

因此​​,你能想到的任何code的情况下搞乱这个吗?如果你拿出任何我会尽力完成RE并希望这将最终的完整的的;)

问候。

PS。我知道...写这篇它说,在右窗格中,下的如何提问的: $,我们可以回答,而不仅仅是讨论p $ PFER问题的这个问题可能违反了:S。但我无法抗拒

在实际上,它甚至可能变成是一个答案,而不是一个问题,对某些人来说。 (太骄傲;?)


解决方案

我考虑的意见(到目前为止),并改变了正则表达式:

<$p$p><$c$c>(?:\\/\\/(?:\\\\\
|[^\
])*\
)|(?:\\/\\*[\\s\\S]*?\\*\\/)|((?:R\"([^(\\\\\\s]{0,16})\\([^)]*\\)\\2\")|(?:@\"[^\"]*?\")|(?:\"(?:\\?\\?'|\\\\\\\\|\\\\\"|\\\\\
|[^\"])*?\")|(?:'(?:\\\\\\\\|\\\\'|\\\\\
|[^'])*?'))

它处理Biffens C ++ 11的原始字符串字面量(以及C#逐字字符串),它的根据Wiktors建议改变。

将其分割,因为在逻辑差异的处理分开单引号和双引号(并避免非工作后引用)。

这无疑更加复杂,但仍远远没有我见过那里的解决方案,难掩任何的字符串问题的。而且它可以被剥离并不适用于特定语言份

一项评论所说的支持的更多的语言。这将使RE(甚至更多)复杂和难以管理。它应该是比较容易适应,虽然

更新regex101例如

感谢大家的输入为止。并保持未来的建议。

问候

编辑:更新原始字符串 - 这一次我真的读过规范。 ;)

I just read a new question here on SO asking basically the same thing as mine does in the title. That got me thinking - and searching the web (most hits pointed to SO, of course ;). So I thought -

There should be a simple regex capable of removing C-style comments from any code.

Yes, there are answers to this question/statement on SO, but the ones I found, the're all incomplete and/or overly complex.

So I started experimenting, and came up with one that works on all types of code I can imagine:

(?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*(?:\n|\r|.)*?\*\/)|(("|')(?:\\\\|\\\2|\\\n|[^\2])*?\2)

The first alternative checks for double slash // comments. The second for ordinary ones /* comment */. The third one is what I had trouble finding other regex'es dealing with the same task handling - strings containing character sequences that outside the string, would be considered comments.

What this part does is to capture any strings in capture group one, matching the quote sign in capture group two, to quoted ones, up to the end of the string.

Capture group one should be kept in the replace, everything discarded (replaced for "") leaving un-commented code :).

Here's a C example at regex101.

OK... So that's not a question. It's an answer you think...

Yes, you're right. So... on to the question.

Have I missed any type of code that this regex would miss?

It handles

multi line comments

/*
    an easy one
*/

"end of line" comments

// Remove this

comments in strings

char array[] = "Following isn't a comment // because it's in a string /* this neither */";

which leads to - strings with escaped quotes

    char array[] = "Handle /* comments */ - // - in strings with \" escaped quotes";

and strings with escaped escapes

    char array[] = "Handle strings with **not** escaped quotes\\"; // <-EOS

javscript single quoted string

var myStr = 'Should also ignore enclosed // comments /* like these */ ';

line continuation

// This is a single line comment \
continuing on the next row (warns, but works in my C++ flavor)

So, can you think of any code cases messing this up? If you come up with any I'll try to complete the RE and hopefully it'll end up complete ;)

Regards.

PS. I know... Writing this it says in the right pane, under How to Ask: We prefer questions that can be answered, not just discussed. This question might violate that :S but I can't resist.

In fact, it may even turn out to be an answer, instead of a question, to some people. (Too cocky? ;)

解决方案

I've considered the comments (so far) and changed the regex to:

(?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*[\s\S]*?\*\/)|((?:R"([^(\\\s]{0,16})\([^)]*\)\2")|(?:@"[^"]*?")|(?:"(?:\?\?'|\\\\|\\"|\\\n|[^"])*?")|(?:'(?:\\\\|\\'|\\\n|[^'])*?'))

It handles Biffens C++11's raw string literal (as well as C# verbatim strings) and it's changed according to Wiktors suggestions.

Split it to handling single and double quotes separately because of difference in logic (and avoiding the non-working back reference ;).

It's undoubtedly more complex, but still far from the solutions I've seen out there which hardly cover any of the string issues. And it could be stripped of parts not applicable to a specific language.

One comment suggested supporting more languages. That would make the RE (even more) complex and unmanageable. It should be relatively easy to adapt though.

Updated regex101 example.

Thanks everyone for the input so far. And keep the suggestions coming.

Regards

Edit: Update Raw String - this time I actually read the spec. ;)

这篇关于如何删除从code C风格的注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆