正则表达式检测分号结束的C ++ for& while循环 [英] Regular expression to detect semi-colon terminated C++ for & while loops
问题描述
在我的Python应用程序中,我需要编写一个正则表达式,它匹配或的C ++ ,而
已用分号(;
)终止。例如,它应该匹配:
In my Python application, I need to write a regular expression that matches a C++ for
or while
loop that has been terminated with a semi-colon (;
). For example, it should match this:
for (int i = 0; i < 10; i++);
...但不是这样:
for (int i = 0; i < 10; i++)
这看起来很琐碎,直到你意识到开始和结束括号之间的文本可能包含其他括号,例如:
This looks trivial at first glance, until you realise that the text between the opening and closing parenthesis may contain other parenthesis, for example:
for (int i = funcA(); i < funcB(); i++);
我使用的是python.re模块。现在我的正则表达式看起来像这样(我留下了我的意见,所以你可以更容易理解):
I'm using the python.re module. Right now my regular expression looks like this (I've left my comments in so you can understand it easier):
# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\( # match the initial opening parenthesis
# Now make a named group 'balanced' which matches a balanced substring.
(?P<balanced>
# A balanced substring is either something that is not a parenthesis:
[^()]
| # …or a parenthesised string:
\( # A parenthesised string begins with an opening parenthesis
(?P=balanced)* # …followed by a sequence of balanced substrings
\) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
\) # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*
这对所有上述情况都是完美的,但是当你尝试使for循环的第三部分包含一个函数时,它就会中断,例如:
This works perfectly for all the above cases, but it breaks as soon as you try and make the third part of the for loop contain a function, like so:
for (int i = 0; i < 10; doSomethingTo(i));
我认为它打破了,因为只要你在开始和结束括号之间放置一些文本,平衡组匹配包含文本,因此(?P = balanced)
部分不再工作,因为它不匹配(由于事实,
I think it breaks because as soon as you put some text between the opening and closing parenthesis, the "balanced" group matches that contained text, and thus the (?P=balanced)
part doesn't work any more since it won't match (due to the fact that the text inside the parenthesis is different).
在我的Python代码中,我使用VERBOSE和MULTILINE标志,并创建如下的正则表达式:
In my Python code I'm using the VERBOSE and MULTILINE flags, and creating the regular expression like so:
REGEX_STR = r"""# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\( # match the initial opening parenthesis
# Now make a named group 'balanced' which matches
# a balanced substring.
(?P<balanced>
# A balanced substring is either something that is not a parenthesis:
[^()]
| # …or a parenthesised string:
\( # A parenthesised string begins with an opening parenthesis
(?P=balanced)* # …followed by a sequence of balanced substrings
\) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
\) # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*"""
REGEX_OBJ = re.compile(REGEX_STR, re.MULTILINE| re.VERBOSE)
任何人都可以建议改进这个正则表达式
Can anyone suggest an improvement to this regular expression? It's getting too complicated for me to get my head around.
推荐答案
你可以写一个非常简单的例程,而不使用正则表达式:
You could write a little, very simple routine that does it, without using a regular expression:
- 设置一个位置计数器
pos
或
同时
。 - 设置开括号计数器
openBr
到0
。 - 现在继续增加
pos
,读取相应位置处的字符,并在看到开始括号时增加openBr
,并在看到结束括号时递减在开始处增加一次,对于for(
)中的第一个开始括号,在两者之间递增和递减一些括号,并将其设置为0
当您的为
支架关闭。 - 因此,当
openBr
再次0
- Set a position counter
pos
so that is points to just before the opening bracket after yourfor
orwhile
. - Set an open brackets counter
openBr
to0
. - Now keep incrementing
pos
, reading the characters at the respective positions, and incrementopenBr
when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (
", increment and decrement some more for some brackets in between, and set it back to0
when yourfor
bracket closes. - So, stop when
openBr
is0
again.
停止位置是 for(...)
的结束括号。现在您可以检查是否有分号跟随或不。
The stopping positon is your closing bracket of for(...)
. Now you can check if there is a semicolon following or not.
这篇关于正则表达式检测分号结束的C ++ for& while循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!