使用正则表达式从源文件中删除注释 [英] Using regex to remove comments from source files

查看:58
本文介绍了使用正则表达式从源文件中删除注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序来自动编写一些 C 代码,(我正在编写将字符串解析为具有相同名称的枚举)C 对字符串的处理并不是那么好.所以有些人一直在唠叨我试试python.

I'm making a program to automate the writing of some C code, (I'm writing to parse strings into enumerations with the same name) C's handling of strings is not that great. So some people have been nagging me to try python.

我做了一个应该删除 C 风格的函数 /* COMMENT *///COMMENT从一个字符串:代码如下:

I made a function that is supposed to remove C-style /* COMMENT */ and //COMMENT from a string: Here is the code:

def removeComments(string):
    re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurance streamed comments (/*COMMENT */) from string
    re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurance singleline comments (//COMMENT\n ) from string

所以我尝试了这段代码.

So I tried this code out.

str="/* spam * spam */ eggs"
removeComments(str)
print str

它显然什么也没做.

对我做错了什么有什么建议吗?

Any suggestions as to what I've done wrong?

有一种说法我听过几次:

There's a saying I've heard a couple of times:

如果您遇到问题并尝试使用 Regex 解决它,您最终会遇到两个问题.

If you have a problem and you try to solve it with Regex you end up with two problems.


多年后回顾这一年.(经过相当多的解析经验)


Looking back at this years later. (after a fair bit more parsing experience)

我认为正则表达式可能是正确的解决方案.和这里使用的简单正则表达式足够好".我在问题中可能没有足够强调这一点.这是针对单个特定文件的.那没有棘手的情况.我认为保持文件被解析到正则表达式足够简单,而不是将正则表达式复杂化为不可读的符号汤,维护要少得多.(例如,要求文件只使用 // 单行注释.)

I think regex may have been the right solution. And the simple regex used here "good enough". I may not have emphasized this enough in the question. This was for a single specific file. That had no tricky situations. I think it would be a lot less maintenance to keep the file being parsed simple enough for the regex, than to complicate the regex, into an unreadable symbol soup. (e.g. require that the file only use // single line comments.)

推荐答案

re.sub 返回一个字符串,因此将您的代码更改为以下内容将得到结果:

re.sub returns a string, so changing your code to the following will give results:

def removeComments(string):
    string = re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurrences streamed comments (/*COMMENT */) from string
    string = re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurrence single-line comments (//COMMENT\n ) from string
    return string

这篇关于使用正则表达式从源文件中删除注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆