是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配? [英] Is it possible to match nested brackets with a regex without using recursion or balancing groups?

查看:108
本文介绍了是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:将任意嵌套的括号组匹配一种正则表达式,例如Java的 java.util.regex ,它既不支持递归组也不支持平衡组.也就是说,匹配以下三个外部组:

The problem: Match an arbitrarily nested group of brackets in a flavour of regex such as Java's java.util.regex that supports neither recursion nor balancing groups. I.e., match the three outer groups in:

(F(i(r(s)t)))((S(e)((c)(o))(n)d)((((((((Third))))))) )

(F(i(r(s)t))) ((S)(e)((c)(o))(n)d) (((((((Third)))))))

此练习纯粹是学术性的,因为我们都知道不应该将正则表达式用于匹配这些内容,就像 Q尖不应用于清洁耳朵.

This exercise is purely academic, since we all know that regular expressions are not supposed to be used to match these things, just as Q-tips are not supposed to be used to clean ears.

堆栈溢出鼓励回答一些自问自答的问题,因此我决定创建此帖子以分享我最近发现的内容.

推荐答案

确实!可以使用正向引用:

(?=\()(?:(?=.*?\((?!.*?\1)(.*\)(?!.*\2).*))(?=.*?\)(?!.*?\2)(.*)).)+?.*?(?=\1)[^(]*(?=\2$)

证明

Et voila ;在那里.从开始到结束,那里的那一组匹配完整的嵌套括号.每次比赛必须捕获并保存两个子字符串.这些对您毫无用处.只需关注主比赛的结果即可.

Et voila; there it is. That right there matches a full group of nested parentheses from start to end. Two substrings per match are necessarily captured and saved; these are useless to you. Just focus on the results of the main match.

不,深度没有限制.不,没有隐藏在其中的递归构造.只是普通的环顾四周,带有大量的前向引用.如果您的风格不支持前向引用(JavaScript,我正在看您),那么对不起.我真的是.我希望能为您提供帮助,但我不是一个疯狂的奇迹工作者.

No, there is no limit on depth. No, there are no recursive constructs hidden in there. Just plain ol' lookarounds, with a splash of forward referencing. If your flavour does not support forward references (I'm looking at you, JavaScript), then I'm sorry. I really am. I wish I could help you, but I'm not a freakin' miracle worker.

好的,这是交易.我们之所以能够匹配这些外部群体,是因为它们是不重叠的.一旦我们想要的比赛开始重叠,我们就必须对策略进行一些调整.我们仍然可以检查主题是否有正确平衡的括号组.但是,除了将它们完全匹配之外,我们需要将它们保存到一个捕获组中,如下所示:

OK, here's the deal. The reason we were able to match those outer groups is because they are non-overlapping. As soon as the matches we desire begin to overlap, we must tweak our strategy somewhat. We can still inspect the subject for correctly-balanced groups of parentheses. However, instead of outright matching them, we need to save them with a capturing group like so:

(?=\()(?=((?:(?=.*?\((?!.*?\2)(.*\)(?!.*\3).*))(?=.*?\)(?!.*?\3)(.*)).)+?.*?(?=\2)[^(]*(?=\3$))) 

与上一个表达式完全相同,只是我将其大部分包装在了一个前行中以避免使用字符,添加了一个捕获组,并调整了反向引用索引,以便它们与新朋友一起玩得很好.现在,表达式在下一个括号组之前的位置匹配,并且感兴趣的子字符串另存为\ 1.

Exactly the same as the previous expression, except I've wrapped the bulk of it in a lookahead to avoid consuming characters, added a capturing group, and tweaked the backreference indices so they play nice with their new friend. Now the expression matches at the position just before the next parenthetical group, and the substring of interest is saved as \1.

很高兴你问.通用方法非常简单:一次遍历一个字符,同时匹配下一个出现的'('和')',在每种情况下都捕获字符串的其余部分,从而确定从中继续搜索的位置.下一次迭代.让我一步一步地将其分解:

I'm glad you asked. The general method is quite simple: iterate through characters one at a time while simultaneously matching the next occurrences of '(' and ')', capturing the rest of the string in each case so as to establish positions from which to resume searching in the next iteration. Let me break it down piece by piece:

因此,您已经拥有了它.一种使用前向引用与标准(扩展)正则表达式功能配合使用来平衡平衡嵌套结构的方法-无需递归或平衡组.它效率不高,当然也不是很漂亮,但是有可能.而且它从来没有做过.对我来说,这很令人兴奋.

So, there you have it. A way to match balanced nested structures using forward references coupled with standard (extended) regular expression features - no recursion or balanced groups. It's not efficient, and it certainly isn't pretty, but it is possible. And it's never been done before. That, to me, is quite exciting.

我知道你们中的很多人都使用正则表达式来完成并帮助其他用户完成更简单,更实际的任务,但是如果有其他人分享我对使用正则表达式推动可能性极限的激动,那么我希望收到您的来信.如果有兴趣,我还有其他类似的材料可以发表.

这篇关于是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆