Antlr4 意外停止解析表达式 [英] Antlr4 unexpectedly stops parsing expression

查看:26
本文介绍了Antlr4 意外停止解析表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个带有公式语法的简单计算器:

I'm developing a simple calculator with the formula grammar:

grammar Formula ;
expr : <assoc=right> expr POW expr             # pow
     | MINUS expr                              # unaryMinus
     | PLUS expr                               # unaryPlus
     | expr PERCENT                            # percent
     | expr op=(MULTIPLICATION|DIVISION) expr  # multiplyDivide
     | expr op=(PLUS|MINUS) expr               # addSubtract
     | ABS '(' expr ')'                        # abs
     | '|' expr '|'                            # absParenthesis
     | MAX '(' expr ( ',' expr )* ')'          # max
     | MIN '(' expr ( ',' expr )* ')'          # min
     | '(' expr  ')'                           # parenthesis
     | NUMBER                                  # number
     | '"' COLUMN '"'                          # column
     ;

MULTIPLICATION: '*' ;
DIVISION: '/' ;
PLUS: '+' ;
MINUS: '-' ;
PERCENT: '%' ;
POW: '^' ;
ABS: [aA][bB][sS] ;
MAX: [mM][aA][xX] ;
MIN: [mM][iI][nN] ;
NUMBER: [0-9]+('.'[0-9]+)? ;
COLUMN: (~[\r\n"])+ ;
WS : [ \t\r\n]+ -> skip ;

"column a"*"column b" 输入按预期为我提供以下树:

"column a"*"column b" input gives me following tree as expected:

但是 "column a" * "column b" 输入意外停止解析:

But "column a" * "column b" input unexpectedly stops parsing:

我错过了什么?

推荐答案

您的 WS 规则被 COLUMN 规则破坏,该规则具有更高的 优先级.更准确地说,问题在于 ~[\r\n"] 也匹配空格字符.

Your WS rule is broken by the COLUMN rule, which has a higher precedence. More precisely, the issue is that ~[\r\n"] matches space characters too.

"column a"*"column b" 词法如下: '"' COLUMN '"' 乘法 '"' COLUMN '"'

"column a"*"column b" lexes as follows: '"' COLUMN '"' MULTIPLICATION '"' COLUMN '"'

"column a" * "column b" 词法如下: '"' COLUMN '"'> COLUMN '"' COLUMN '"'

"column a" * "column b" lexes as follows: '"' COLUMN '"' COLUMN '"' COLUMN '"'

是的,space star space"作为 COLUMN 标记被词法化,因为这就是 ANTLR 词法分析器规则的工作方式:更长的标记匹配获得优先权.

Yes, "space star space" got lexed as a COLUMN token because that's how ANTLR lexer rules work: longer token matches get priority.

如您所见,该令牌流整体上与 expr 规则匹配,因此 expr 尽可能匹配, 即 '"' COLUMN '"'.

As you can see, this token stream does not match the expr rule as a whole, so expr matches as much as it could, which is '"' COLUMN '"'.

像您一样声明一个只包含否定规则的词法分析器规则总是一个坏主意.拥有单独的 '"' 标记也不适合我.

Declaring a lexer rule with only a negative rule like you did is always a bad idea. And having separate '"' tokens doesn't feel right for me either.

您应该做的是在 COLUMN 规则中包含引号,因为它们在逻辑上是标记的一部分:

What you should have done is to include the quotes in the COLUMN rule as they're logically part of the token:

COLUMN: '"' (~["\r\n])* '"';

然后从解析器规则中删除独立引号.您可以稍后在处理解析树时取消引用文本,或者更改词法分析器中的标记发射逻辑以更改标记的基础值.

Then remove the standalone quotes from your parser rule. You can either unquote the text later when you'll be processing the parse tree, or change the token emission logic in the lexer to change the underlying value of the token.

为了不忽略尾随输入,添加另一条规则以确保您已经消耗了整个输入:

And in order to not ignore trailing input, add another rule which will make sure you've consumed the whole input:

formula: expr EOF;

然后在调用解析器时使用此规则作为输入规则,而不是 expr.

Then use this rule as your entry rule instead of expr when calling your parser.

这篇关于Antlr4 意外停止解析表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆