将正则表达式中的捕获组解析为树的代码 [英] Code to parse capture groups in regular expressions into a tree
问题描述
我需要在正则表达式中标识(可能嵌套)捕获组并创建树.特定的目标是Java-1.6,理想情况下,我希望使用Java代码.一个简单的例子是:
I need to identify (potentially nested) capture groups within regular expressions and create a tree. The particular target is Java-1.6 and I'd ideally like Java code. A simple example is:
((a(b | c)d(e(f * g))h)"
"(a(b|c)d(e(f*g))h)"
将被解析为
"a(b|c)d(e(f*g))h"
... "b|c"
... "e(f*g)"
... "f*g"
理想情况下,解决方案应考虑计数表达式,量词等以及转义级别.但是,如果这不容易找到更简单的方法就足够了,因为我们可以限制使用的语法.
The solution should ideally account for count expressions, quantifiers, etc and levels of escaping. However if this is not easy to find a simpler approach might suffice as we can limit the syntax used.
编辑.澄清.我想解析正则表达式字符串本身.为此,我需要了解Java 1.6正则表达式的BNF或等效语言.我希望有人已经做到了.
EDIT. To clarify. I want to parse the regular expression string itself. To do so I need to know the BNF or equivalent for Java 1.6 regexes. I am hoping someone has already done this.
结果的副产品是该过程将测试正则表达式的有效性.
A byproduct of a result would be that the process would test for validity of the regex.
推荐答案
考虑逐步使用实际的解析器/词法分析器:
Consider stepping up to an actual parser/lexer: http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Getting+Started
它看起来很复杂,但是如果您的语言相当简单,那就很简单了.如果不是这样,那么使用正则表达式执行此操作可能会使您的生活变得地狱:)
It looks complicated, but if your language is fairly simple, it's fairly straightforward. And if it's not, doing it in regexes will probably make your life hell :)
这篇关于将正则表达式中的捕获组解析为树的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!