是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配? [英] Is it possible to match nested brackets with regex without using recursion or balancing groups?

查看:141
本文介绍了是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

StackOverflow鼓励自我回答问题,所以我决定创建这篇文章来分享我最近发现的东西。



问题:在一个正则表达式中匹配任意嵌套的括号组,例如Java的 java.util.regex 既不支持递归也不支持平衡组。即,匹配3个外部组:


(F(i(r(s)t)))((S)(e )((c)(o))(n)d)(((((((Third)))))))


这个练习纯粹是学术性的,因为我们都知道正则表达式不应该被用来匹配这些东西,就像Q-tips不应该被用来清理耳朵一样。

解决方案

确实!可以使用前向引用:



 (?= \()(?:(?=。*?\(( ?!。*?\1)(。* \)(?!。* \2)。*))(?=。*?\)(?!。*?\2)(*。 ))。)+?。*?(?= \ 1)[^(] *(?= \ $ $)



结论



所以,你有它。使用前向引用和标准(扩展)正则表达式特征匹配平衡嵌套结构的方法 - 无递归或平衡组。它效率不高,当然不是很漂亮,但它是可能的。而且以前从未做过。对我而言,这非常令人兴奋。



我知道很多人使用正则表达式来完成并帮助其他用户完成更简单,更实际的任务,但是如果有的话那些与我分享我用兴奋剂推动可能性极限的兴奋的人然后我很乐意听到你的消息。如果有兴趣,我还有其他类似材料可以发布。


StackOverflow encourages self-answered questions, so I decided to create this post to share something I recently discovered.

The problem: Match an arbitrarily nested group of brackets in a flavour of regex such as Java's java.util.regex that supports neither recursion nor balancing groups. Ie, match the 3 outer groups in:

(F(i(r(s)t))) ((S)(e)((c)(o))(n)d) (((((((Third)))))))

This exercise is purely academic, since we all know that regular expressions are not supposed to be used to match these things, just as Q-tips are not supposed to be used to clean ears.

解决方案

Indeed! It's possible using forward references:

(?=\()(?:(?=.*?\((?!.*?\1)(.*\)(?!.*\2).*))(?=.*?\)(?!.*?\2)(.*)).)+?.*?(?=\1)[^(]*(?=\2$)

Proof

Et voila; there it is. That right there matches a full group of nested parentheses from start to end. Two substrings per match are necessarily captured and saved; these are useless to you. Just focus on the results of the main match.

No, there is no limit on depth. No, there are no recursive constructs hidden in there. Just plain ol' lookarounds, with a splash of forward referencing. If your flavour does not support forward references (I'm looking at you, JavaScript), then I'm sorry. I really am. I wish I could help you, but I'm not a freakin' miracle worker.

That's great and all, but I want to match inner groups too!

OK, here's the deal. The reason we were able to match those outer groups is because they are non-overlapping. As soon as the matches we desire begin to overlap, we must tweak our strategy somewhat. We can still inspect the subject for correctly-balanced groups of parentheses. However, instead of outright matching them, we need to save them with a capturing group like so:

(?=\()(?=((?:(?=.*?\((?!.*?\2)(.*\)(?!.*\3).*))(?=.*?\)(?!.*?\3)(.*)).)+?.*?(?=\2)[^(]*(?=\3$))) 

Exactly the same as the previous expression, except I've wrapped the bulk of it in a lookahead to avoid consuming characters, added a capturing group, and tweaked the backreference indices so they play nice with their new friend. Now the expression matches at the position just before the next parenthetical group, and the substring of interest is saved as \1.

So... how the hell does this actually work?

I'm glad you asked. The general method is quite simple: iterate through characters one at a time while simultaneously matching the next occurrences of '(' and ')', capturing the rest of the string in each case so as to establish positions from which to resume searching in the next iteration. Let me break it down piece by piece:

Conclusion

So, there you have it. A way to match balanced nested structures using forward references coupled with standard (extended) regex features - no recursion or balanced groups. It's not efficient, and it certainly isn't pretty, but it is possible. And it's never been done before. That, to me, is quite exciting.

I know a lot of you use regular expressions to accomplish and help other users accomplish simpler and more practical tasks, but if there is anyone out there who shares my excitement for pushing the limits of possibility with regex then I'd love to hear from you. If there is interest, I have other similar material to post.

这篇关于是否可以在不使用递归或平衡组的情况下将嵌套括号与正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆