Antlr4语法歧义 [英] Antlr4 grammar ambiguity
问题描述
我有以下语法(对于SO来说是最小的)
I have the following grammar ( minimized for SO)
grammar Hello;
odataIdentifier : identifierLeadingCharacter identifierCharacter*;
identifierLeadingCharacter : Alpha| UNDERSCORE;
identifierCharacter : identifierLeadingCharacter | Digit;
identifierUnreserved : identifierCharacter | (MINUS | DOT | TILDE);
Digit : ZERO_TO_FIVE |[6-9];
ONEHUNDRED_TO_ONEHUNDREDNINETYNINE : '1' Digit Digit; // 100-199
TWOHUNDRED_TO_TWOHUNDREDFOURTYNINE : '2' ZERO_TO_FOUR Digit; // 200-249
TWOHUNDREDFIFTY_TO_TWOHUNDREDFIFTYFIVE : '25' ZERO_TO_FIVE; // 250-255
TEN_TO_NINETYNINE : ONE_TO_NINE Digit; // 10-99
ZERO_TO_ONE : [0-1];
ZERO_TO_TWO : ZERO_TO_ONE | [2];
ZERO_TO_THREE : ZERO_TO_TWO | [3];
ZERO_TO_FOUR : ZERO_TO_THREE | [4];
ZERO_TO_FIVE : ZERO_TO_FOUR | [5];
ONE_TO_TWO : [1-2];
ONE_TO_THREE : ONE_TO_TWO | [3];
ONE_TO_FOUR : ONE_TO_THREE | [4];
ONE_TO_NINE : ONE_TO_FOUR | [5-9];
Alpha : [a-zA-Z];
MINUS : [-];
DOT : '.';
UNDERSCORE : '_';
TILDE : '~';
WS : (' '|'\r'|'\t'|'\u000C'|'\n') -> skip
;
对于输入c9
来说,它工作正常,但是当我有2位数字时,例如c10
,它说:
for input c9
it works fine, but when i have 2 digits for example c10
it says:
extraneous input '92' expecting {<EOF>, Digit, Alpha, '_'}
所以我想它解析9
和解析2
,但不知道这应该是TEN_TO_NINETYNINE
还是2
Digit
Digit
.
我对此很菜鸟,所以想知道我的分析是否正确,以及如何减轻这种情况...
so i guess it parses 9
and parses 2
and doesn't know if this should be TEN_TO_NINETYNINE
or 2
Digit
Digit
.
i am a noob to this, so wondering if my analysis is right and how could i alleviate this ...
推荐答案
您的输入将生成一个Alpha
令牌,后跟一个TEN_TO_NINETYNINE
令牌.尽管解析器规则identifierLeadingCharacter
确实允许使用Alpha
令牌,但identifierCharacter
规则不能与TEN_TO_NINETYNINE
令牌匹配.
Your input is resulting in an Alpha
token followed by a TEN_TO_NINETYNINE
token. While the parser rule identifierLeadingCharacter
does allow the Alpha
token, the identifierCharacter
rule cannot match a TEN_TO_NINETYNINE
token.
输入10
总是产生一个TEN_TO_NINETYNINE
令牌,而不是两个Digit
令牌,因为前者匹配更多的输入,并且词法分析器规则很贪婪.
The input 10
will always produce a TEN_TO_NINETYNINE
token rather than two Digit
tokens, because the former matches more of the input and lexer rules are greedy.
这篇关于Antlr4语法歧义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!