是否有一个能够解析匹配符号的类似Regex的东西? [英] Is there a Regex-like that is capable of parsing matching symbols?

查看:58
本文介绍了是否有一个能够解析匹配符号的类似Regex的东西?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此正则表达式

/\(.*\)/

不匹配匹配的括号,而是字符串中的最后一个括号.是否存在具有适当语法的正则表达式扩展或类似内容?例如:

won't match the matching parenthesis but the last parenthesis in the string. Is there a regular expression extension, or something similar, with a proper syntax that allows for this? For example:

there are (many (things (on) the)) box (except (carrots (and apples)))

/OPEN(.*CLOSE)/应该与(many (things (on) the))

可能有无限多个括号.

推荐答案

如果只有一个括号,那么有两种可能性.

If you only have one level of parentheses, then there are two possibilities.

选择1:使用不愉快的重复:

/\(.*?\)/

这会在遇到第一个)时停止.

This will stop when it encounters the first ).

选项2:使用否定字符类

/\([^)]*\)/

这只能重复非)的字符,因此它一定不能超过第一个结束括号.由于性能原因,通常首选此选项.另外,更容易扩展此选项以允许转义括号(以便您可以匹配以下完整字符串:(some\)thing)而不是丢弃thing)).但这可能很少需要.

This can only repeat characters that are not ), so it can necessarily never go past the first closing parenthesis. This option is usually preferred due to performance reasons. In addition, this option is more easily extended to allow for escaping parenthesis (so that you could match this complete string: (some\)thing) instead of throwing away thing)). But this is probably rather rarely necessary.

但是,如果您需要嵌套结构,则对于正则表达式来说通常太复杂了(尽管某些类似PCRE的样式支持递归模式).在这种情况下,您应该自己遍历字符串并计算括号,以跟踪当前的嵌套级别.

However if you want nested structures, this is generally too complicated for regex (although some flavors like PCRE support recursive patterns). In this case, you should just go through the string yourself and count parentheses, to keep track of your current nesting level.

作为这些递归模式的附带说明:在PCRE (?R)中,它仅表示整个模式,因此将其插入某个位置可使整个事情递归.但是,括号的每个内容都必须与整个匹配项具有相同的结构.同样,用此方法进行有意义的单步替换以及使用多个嵌套级别上的捕获组实际上是不可能的.总而言之-您最好不要对嵌套结构使用正则表达式.

Just as a side note about those recursive patterns: In PCRE (?R) simply represents the whole pattern, so inserting this somewhere makes the whole thing recursive. But then every content of parentheses must be of the same structure as the whole match. Also, it is not really possible to do meaningful one-step replacements with this, as well as using capturing groups on multiple nested levels. All in all - you are best off, not to use regular expressions for nested structures.

更新:由于您似乎渴望找到一个正则表达式解决方案,因此可以使用PCRE(PHP中的示例实现)来匹配示例:

Update: Since you seem eager to find a regex solution, here is how you would match your example using PCRE (example implementation in PHP):

$str = 'there are (many (things (on) the)) box (except (carrots (and apples)))';
preg_match_all('/\([^()]*(?:(?R)[^()]*)*\)/', $str, $matches);
print_r($matches);

产生

Array
(
    [0] => Array
        (
            [0] => (many (things (on) the))
            [1] => (except (carrots (and apples)))
        )   
)

模式的作用:

\(      # opening bracket
[^()]*  # arbitrarily many non-bracket characters
(?:     # start a non-capturing group for later repetition
(?R)    # recursion! (match any nested brackets)
[^()]*  # arbitrarily many non-bracket characters
)*      # close the group and repeat it arbitrarily many times
\)      # closing bracket

这允许无限的嵌套级别以及无限的并行级别.

This allows for infinite nested levels and also for infinite parallel levels.

请注意,不可能将所有嵌套级别作为单独的捕获组获取.您将始终只获得最内层或最外层的组.另外,不可能像这样进行递归替换.

Note that it is not possible to get all nested levels as separate captured groups. You will always just get the inner-most or outer-most group. Also, doing a recursive replacement is not possible like this.

这篇关于是否有一个能够解析匹配符号的类似Regex的东西?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆