Antlr4意外停止解析表达式 [英] Antlr4 unexpectedly stops parsing expression
问题描述
我正在使用公式语法开发一个简单的计算器:
I'm developing a simple calculator with the formula grammar:
grammar Formula ;
expr : <assoc=right> expr POW expr # pow
| MINUS expr # unaryMinus
| PLUS expr # unaryPlus
| expr PERCENT # percent
| expr op=(MULTIPLICATION|DIVISION) expr # multiplyDivide
| expr op=(PLUS|MINUS) expr # addSubtract
| ABS '(' expr ')' # abs
| '|' expr '|' # absParenthesis
| MAX '(' expr ( ',' expr )* ')' # max
| MIN '(' expr ( ',' expr )* ')' # min
| '(' expr ')' # parenthesis
| NUMBER # number
| '"' COLUMN '"' # column
;
MULTIPLICATION: '*' ;
DIVISION: '/' ;
PLUS: '+' ;
MINUS: '-' ;
PERCENT: '%' ;
POW: '^' ;
ABS: [aA][bB][sS] ;
MAX: [mM][aA][xX] ;
MIN: [mM][iI][nN] ;
NUMBER: [0-9]+('.'[0-9]+)? ;
COLUMN: (~[\r\n"])+ ;
WS : [ \t\r\n]+ -> skip ;
"column a"*"column b"
输入为我提供了预期的以下树:
"column a"*"column b"
input gives me following tree as expected:
但是"column a" * "column b"
输入意外停止分析:
But "column a" * "column b"
input unexpectedly stops parsing:
我想念什么?
推荐答案
您的WS
规则被COLUMN
规则所破坏,该规则的也会匹配空格字符.
Your WS
rule is broken by the COLUMN
rule, which has a higher precedence. More precisely, the issue is that ~[\r\n"]
matches space characters too.
"column a"*"column b"
词法如下:'"'
COLUMN
'"'
MULTIPLICATION
'"'
COLUMN
'"'
"column a"*"column b"
lexes as follows: '"'
COLUMN
'"'
MULTIPLICATION
'"'
COLUMN
'"'
"column a" * "column b"
词法如下:'"'
COLUMN
'"'
COLUMN
'"'
COLUMN
'"'
"column a" * "column b"
lexes as follows: '"'
COLUMN
'"'
COLUMN
'"'
COLUMN
'"'
是的,"太空之星空间"被词汇化为COLUMN
令牌,因为这是ANTLR词汇程序规则的工作方式:更长的令牌匹配获得优先级.
Yes, "space star space" got lexed as a COLUMN
token because that's how ANTLR lexer rules work: longer token matches get priority.
如您所见,此令牌流与expr
规则总体上不匹配,因此expr
尽可能匹配,即'"'
COLUMN
'"'
.
As you can see, this token stream does not match the expr
rule as a whole, so expr
matches as much as it could, which is '"'
COLUMN
'"'
.
像您一样只声明一个否定规则的词法分析器规则总是是个坏主意.而且对于我来说,拥有单独的'"'
令牌也不合适.
Declaring a lexer rule with only a negative rule like you did is always a bad idea. And having separate '"'
tokens doesn't feel right for me either.
您应该做的是将引号包含在COLUMN
规则中,因为引号在逻辑上是令牌的一部分:
What you should have done is to include the quotes in the COLUMN
rule as they're logically part of the token:
COLUMN: '"' (~["\r\n])* '"';
然后从解析器规则中删除独立引号.您可以稍后在处理解析树时取消对文本的引用,或者更改词法分析器中的令牌发射逻辑以更改令牌的基础值.
Then remove the standalone quotes from your parser rule. You can either unquote the text later when you'll be processing the parse tree, or change the token emission logic in the lexer to change the underlying value of the token.
为了不忽略尾随输入,请添加另一条规则,以确保您已经消耗了整个输入:
And in order to not ignore trailing input, add another rule which will make sure you've consumed the whole input:
formula: expr EOF;
然后在调用解析器时将此规则用作输入规则,而不是expr
.
Then use this rule as your entry rule instead of expr
when calling your parser.
这篇关于Antlr4意外停止解析表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!