Antlr4意外停止解析表达式 [英] Antlr4 unexpectedly stops parsing expression

查看:144
本文介绍了Antlr4意外停止解析表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用公式语法开发一个简单的计算器:

I'm developing a simple calculator with the formula grammar:

grammar Formula ;
expr : <assoc=right> expr POW expr             # pow
     | MINUS expr                              # unaryMinus
     | PLUS expr                               # unaryPlus
     | expr PERCENT                            # percent
     | expr op=(MULTIPLICATION|DIVISION) expr  # multiplyDivide
     | expr op=(PLUS|MINUS) expr               # addSubtract
     | ABS '(' expr ')'                        # abs
     | '|' expr '|'                            # absParenthesis
     | MAX '(' expr ( ',' expr )* ')'          # max
     | MIN '(' expr ( ',' expr )* ')'          # min
     | '(' expr  ')'                           # parenthesis
     | NUMBER                                  # number
     | '"' COLUMN '"'                          # column
     ;

MULTIPLICATION: '*' ;
DIVISION: '/' ;
PLUS: '+' ;
MINUS: '-' ;
PERCENT: '%' ;
POW: '^' ;
ABS: [aA][bB][sS] ;
MAX: [mM][aA][xX] ;
MIN: [mM][iI][nN] ;
NUMBER: [0-9]+('.'[0-9]+)? ;
COLUMN: (~[\r\n"])+ ;
WS : [ \t\r\n]+ -> skip ;

"column a"*"column b"输入为我提供了预期的以下树:

"column a"*"column b" input gives me following tree as expected:

但是"column a" * "column b"输入意外停止分析:

But "column a" * "column b" input unexpectedly stops parsing:

我想念什么?

推荐答案

您的WS规则被COLUMN规则所破坏,该规则的也会匹配空格字符.

Your WS rule is broken by the COLUMN rule, which has a higher precedence. More precisely, the issue is that ~[\r\n"] matches space characters too.

"column a"*"column b"词法如下:'"' COLUMN '"' MULTIPLICATION '"' COLUMN '"'

"column a"*"column b" lexes as follows: '"' COLUMN '"' MULTIPLICATION '"' COLUMN '"'

"column a" * "column b"词法如下:'"' COLUMN '"' COLUMN '"' COLUMN '"'

"column a" * "column b" lexes as follows: '"' COLUMN '"' COLUMN '"' COLUMN '"'

是的,"太空之星空间"被词汇化为COLUMN令牌,因为这是ANTLR词汇程序规则的工作方式:更长的令牌匹配获得优先级.

Yes, "space star space" got lexed as a COLUMN token because that's how ANTLR lexer rules work: longer token matches get priority.

如您所见,此令牌流与expr规则总体上匹配,因此expr尽可能匹配,即'"' COLUMN '"'.

As you can see, this token stream does not match the expr rule as a whole, so expr matches as much as it could, which is '"' COLUMN '"'.

像您一样只声明一个否定规则的词法分析器规则总是是个坏主意.而且对于我来说,拥有单独的'"'令牌也不合适.

Declaring a lexer rule with only a negative rule like you did is always a bad idea. And having separate '"' tokens doesn't feel right for me either.

您应该做的是将引号包含在COLUMN规则中,因为引号在逻辑上是令牌的一部分:

What you should have done is to include the quotes in the COLUMN rule as they're logically part of the token:

COLUMN: '"' (~["\r\n])* '"';

然后从解析器规则中删除独立引号.您可以稍后在处理解析树时取消对文本的引用,或者更改词法分析器中的令牌发射逻辑以更改令牌的基础值.

Then remove the standalone quotes from your parser rule. You can either unquote the text later when you'll be processing the parse tree, or change the token emission logic in the lexer to change the underlying value of the token.

为了不忽略尾随输入,请添加另一条规则,以确保您已经消耗了整个输入:

And in order to not ignore trailing input, add another rule which will make sure you've consumed the whole input:

formula: expr EOF;

然后在调用解析器时将此规则用作输入规则,而不是expr.

Then use this rule as your entry rule instead of expr when calling your parser.

这篇关于Antlr4意外停止解析表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆