Antlr(词法分析器):匹配正确的标记 [英] Antlr (lexer): matching the right token
问题描述
在我的 Antlr3 语法中,我有几个重叠"的词法分析器规则,如下所示:
In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:
NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;
尽管像 100110 和 123 之类的标记可以与多个规则匹配,但始终由上下文决定必须匹配其中的哪一个.示例:
Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:
s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;
输入 {17} 然后应该匹配 {、INT 和 },但词法分析器有已经决定 17 是一个 NAT 令牌.我怎样才能防止这种行为?backtrack 选项已经设置为 true,但它似乎只影响解析器规则.
The input {17} should then match {, INT, and }, but the lexer has already decided that 17 is a NAT-token. How can I prevent this behavior? The backtrack option is already set to true, but it only seems to affect parser rules.
推荐答案
可能有一种复杂的方法可以使词法分析器对上下文敏感,但通常这就是您希望解析器处理的事情,并且您希望您的词法分析器只提供令牌流.我的建议是重构您的词法分析器以返回 DIGITS
和 SIGN
并让您的解析器计算出上下文中数字代表的数字类型.
There might be a complex way to make the lexer context-sensitive, but in general that's what you want the parser to take care of, and you want your lexer to just provide a stream of tokens. My recommendation is to refactor your lexer to return DIGITS
and SIGN
and let your parser work out what kind of number the digits represent by the context.
这篇关于Antlr(词法分析器):匹配正确的标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!