ANTLR:如何使用词法分析器分析匹配括号内的区域 [英] ANTLR: how to parse a region within matching brackets with a lexer
问题描述
我想在我的词法分析器中解析这样的内容:
i want to parse something like this in my lexer:
( begin expression )
其中表达式也用方括号括起来.表达式中的内容并不重要,我只想将(begin
和匹配的)
之间的所有内容作为标记.一个例子是:
where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin
and the matching )
as a token. an example would be:
(begin
(define x (+ 1 2)))
因此令牌的文本应为
(define x (+ 1 2)))
类似
PROGRAM : LPAREN BEGIN .* RPAREN;
(显然)不起作用,因为一旦他看到)",他就认为规则已经结束,但是我需要与此匹配的括号.
does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this.
我该怎么做?
推荐答案
在词法分析器规则内部,您可以递归调用规则.因此,这是解决此问题的一种方法.另一种方法是跟踪打开和关闭括号的数量,并使用 门控语义谓词 循环,只要您的计数器大于零即可.
Inside lexer rules, you can invoke rules recursively. So, that's one way to solve this. Another approach would be to keep track of the number of open- and close parenthesis and let a gated semantic predicate loop as long as your counter is more than zero.
grammar T;
parse
: BeginToken {System.out.println("parsed :: " + $BeginToken.text);} EOF
;
BeginToken
@init{int open = 1;}
: '(' 'begin' ( {open > 0}?=> // keep reapeating `( ... )*` as long as open > 0
( ~('(' | ')') // match anything other than parenthesis
| '(' {open++;} // match a '(' in increase the var `open`
| ')' {open--;} // match a ')' in decrease the var `open`
)
)*
;
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String input = "(begin (define x (+ (- 1 3) 2)))";
TLexer lexer = new TLexer(new ANTLRStringStream(input));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
java -cp antlr-3.3-complete.jar org.antlr.Tool T.g
javac -cp antlr-3.3-complete.jar *.java
java -cp .:antlr-3.3-complete.jar Main
parsed :: (begin (define x (+ (- 1 3) 2)))
请注意,您需要注意源中可能包含括号的字符串文字:
Note that you'll need to beware of string literals inside your source that might include parenthesis:
BeginToken
@init{int open = 1;}
: '(' 'begin' ( {open > 0}?=> // ...
( ~('(' | ')' | '"') // ...
| '(' {open++;} // ...
| ')' {open--;} // ...
| '"' ... // TODO: define a string literal here
)
)*
;
或可能包含括号的注释.
or comments that may contain parenthesis.
带谓词的建议使用一些特定于语言的代码(在这种情况下为Java).递归调用词法分析器规则的一个优点是您的词法分析器中没有自定义代码:
The suggestion with the predicate uses some language specific code (Java, in this case). An advantage of calling a lexer rule recursively is that you don't have custom code in your lexer:
BeginToken
: '(' Spaces? 'begin' Spaces? NestedParens Spaces? ')'
;
fragment NestedParens
: '(' ( ~('(' | ')') | NestedParens )* ')'
;
fragment Spaces
: (' ' | '\t')+
;
这篇关于ANTLR:如何使用词法分析器分析匹配括号内的区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!