正则表达式检测分号结束的C ++ for& while循环 [英] Regular expression to detect semi-colon terminated C++ for & while loops

查看:284
本文介绍了正则表达式检测分号结束的C ++ for& while循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的Python应用程序中,我需要编写一个正则表达式,它匹配或的C ++ ,而已用分号(; )终止。例如,它应该匹配:

In my Python application, I need to write a regular expression that matches a C++ for or while loop that has been terminated with a semi-colon (;). For example, it should match this:

for (int i = 0; i < 10; i++);

...但不是这样:

for (int i = 0; i < 10; i++)

这看起来很琐碎,直到你意识到开始和结束括号之间的文本可能包含其他括号,例如:

This looks trivial at first glance, until you realise that the text between the opening and closing parenthesis may contain other parenthesis, for example:

for (int i = funcA(); i < funcB(); i++);

我使用的是python.re模块。现在我的正则表达式看起来像这样(我留下了我的意见,所以你可以更容易理解):

I'm using the python.re module. Right now my regular expression looks like this (I've left my comments in so you can understand it easier):

# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\(  # match the initial opening parenthesis
    # Now make a named group 'balanced' which matches a balanced substring.
    (?P<balanced>
        # A balanced substring is either something that is not a parenthesis:
        [^()]
        | # …or a parenthesised string:
        \( # A parenthesised string begins with an opening parenthesis
            (?P=balanced)* # …followed by a sequence of balanced substrings
        \) # …and ends with a closing parenthesis
    )*  # Look for a sequence of balanced substrings
\)  # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*

这对所有上述情况都是完美的,但是当你尝试使for循环的第三部分包含一个函数时,它就会中断,例如:

This works perfectly for all the above cases, but it breaks as soon as you try and make the third part of the for loop contain a function, like so:

for (int i = 0; i < 10; doSomethingTo(i));

我认为它打破了,因为只要你在开始和结束括号之间放置一些文本,平衡组匹配包含文本,因此(?P = balanced)部分不再工作,因为它不匹配(由于事实,

I think it breaks because as soon as you put some text between the opening and closing parenthesis, the "balanced" group matches that contained text, and thus the (?P=balanced) part doesn't work any more since it won't match (due to the fact that the text inside the parenthesis is different).

在我的Python代码中,我使用VERBOSE和MULTILINE标志,并创建如下的正则表达式:

In my Python code I'm using the VERBOSE and MULTILINE flags, and creating the regular expression like so:

REGEX_STR = r"""# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\(  # match the initial opening parenthesis
    # Now make a named group 'balanced' which matches
    # a balanced substring.
    (?P<balanced>
        # A balanced substring is either something that is not a parenthesis:
        [^()]
        | # …or a parenthesised string:
        \( # A parenthesised string begins with an opening parenthesis
            (?P=balanced)* # …followed by a sequence of balanced substrings
        \) # …and ends with a closing parenthesis
    )*  # Look for a sequence of balanced substrings
\)  # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*"""

REGEX_OBJ = re.compile(REGEX_STR, re.MULTILINE| re.VERBOSE)

任何人都可以建议改进这个正则表达式

Can anyone suggest an improvement to this regular expression? It's getting too complicated for me to get my head around.

推荐答案

你可以写一个非常简单的例程,而不使用正则表达式:

You could write a little, very simple routine that does it, without using a regular expression:


  • 设置一个位置计数器 pos 同时

  • 设置开括号计数器 openBr 0

  • 现在继续增加 pos ,读取相应位置处的字符,并在看到开始括号时增加 openBr ,并在看到结束括号时递减在开始处增加一次,对于 for()中的第一个开始括号,在两者之间递增和递减一些括号,并将其设置为 0 当您的支架关闭。

  • 因此,当 openBr 再次 0
  • Set a position counter pos so that is points to just before the opening bracket after your for or while.
  • Set an open brackets counter openBr to 0.
  • Now keep incrementing pos, reading the characters at the respective positions, and increment openBr when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (", increment and decrement some more for some brackets in between, and set it back to 0 when your for bracket closes.
  • So, stop when openBr is 0 again.

停止位置是 for(...)的结束括号。现在您可以检查是否有分号跟随或不。

The stopping positon is your closing bracket of for(...). Now you can check if there is a semicolon following or not.

这篇关于正则表达式检测分号结束的C ++ for&amp; while循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆