Antlr4 意外停止解析表达式 [英] Antlr4 unexpectedly stops parsing expression
问题描述
我正在开发一个带有公式语法的简单计算器:
I'm developing a simple calculator with the formula grammar:
grammar Formula ;
expr : <assoc=right> expr POW expr # pow
| MINUS expr # unaryMinus
| PLUS expr # unaryPlus
| expr PERCENT # percent
| expr op=(MULTIPLICATION|DIVISION) expr # multiplyDivide
| expr op=(PLUS|MINUS) expr # addSubtract
| ABS '(' expr ')' # abs
| '|' expr '|' # absParenthesis
| MAX '(' expr ( ',' expr )* ')' # max
| MIN '(' expr ( ',' expr )* ')' # min
| '(' expr ')' # parenthesis
| NUMBER # number
| '"' COLUMN '"' # column
;
MULTIPLICATION: '*' ;
DIVISION: '/' ;
PLUS: '+' ;
MINUS: '-' ;
PERCENT: '%' ;
POW: '^' ;
ABS: [aA][bB][sS] ;
MAX: [mM][aA][xX] ;
MIN: [mM][iI][nN] ;
NUMBER: [0-9]+('.'[0-9]+)? ;
COLUMN: (~[\r\n"])+ ;
WS : [ \t\r\n]+ -> skip ;
"column a"*"column b"
输入按预期为我提供以下树:
"column a"*"column b"
input gives me following tree as expected:
但是 "column a" * "column b"
输入意外停止解析:
But "column a" * "column b"
input unexpectedly stops parsing:
我错过了什么?
推荐答案
您的 WS
规则被 COLUMN
规则破坏,该规则具有更高的 优先级.更准确地说,问题在于 ~[\r\n"]
也匹配空格字符.
Your WS
rule is broken by the COLUMN
rule, which has a higher precedence. More precisely, the issue is that ~[\r\n"]
matches space characters too.
"column a"*"column b"
词法如下: '"'
COLUMN
'"'
乘法
'"'
COLUMN
'"'
"column a"*"column b"
lexes as follows: '"'
COLUMN
'"'
MULTIPLICATION
'"'
COLUMN
'"'
"column a" * "column b"
词法如下: '"'
COLUMN
'"'
> COLUMN
'"'
COLUMN
'"'
"column a" * "column b"
lexes as follows: '"'
COLUMN
'"'
COLUMN
'"'
COLUMN
'"'
是的,space star space"作为 COLUMN
标记被词法化,因为这就是 ANTLR 词法分析器规则的工作方式:更长的标记匹配获得优先权.
Yes, "space star space" got lexed as a COLUMN
token because that's how ANTLR lexer rules work: longer token matches get priority.
如您所见,该令牌流不整体上与 expr
规则匹配,因此 expr
尽可能匹配, 即 '"'
COLUMN
'"'
.
As you can see, this token stream does not match the expr
rule as a whole, so expr
matches as much as it could, which is '"'
COLUMN
'"'
.
像您一样声明一个只包含否定规则的词法分析器规则总是一个坏主意.拥有单独的 '"'
标记也不适合我.
Declaring a lexer rule with only a negative rule like you did is always a bad idea. And having separate '"'
tokens doesn't feel right for me either.
您应该做的是在 COLUMN
规则中包含引号,因为它们在逻辑上是标记的一部分:
What you should have done is to include the quotes in the COLUMN
rule as they're logically part of the token:
COLUMN: '"' (~["\r\n])* '"';
然后从解析器规则中删除独立引号.您可以稍后在处理解析树时取消引用文本,或者更改词法分析器中的标记发射逻辑以更改标记的基础值.
Then remove the standalone quotes from your parser rule. You can either unquote the text later when you'll be processing the parse tree, or change the token emission logic in the lexer to change the underlying value of the token.
为了不忽略尾随输入,添加另一条规则以确保您已经消耗了整个输入:
And in order to not ignore trailing input, add another rule which will make sure you've consumed the whole input:
formula: expr EOF;
然后在调用解析器时使用此规则作为输入规则,而不是 expr
.
Then use this rule as your entry rule instead of expr
when calling your parser.
这篇关于Antlr4 意外停止解析表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!