用Java生成的Antlr解析器:并非所有输入都被读取 [英] Generated Antlr Parser in Java: Not all inputs are read
问题描述
我正在研究Antlr语法,以使用Java解析多个变量中的多项式函数.合法输入的示例是
I am working on my Antlr grammar to parse polynomial functions in multiple variables using Java. Examples for legal input are
42; X; +42X; Y^42; 1337HelloWorld; 13,37X^42;
以下语法的确编译时没有警告或错误:
The following grammar does compile without warnings or errors:
grammar Function;
parseFunction returns [java.util.List<java.util.List<Object>> list] :
{ list = new java.util.ArrayList(); } ( f=functionPart { list.add($f.list); } )+
| { list = new java.util.ArrayList(); } ( fb=functionBegin ) { list.add($fb.list); } ( f=functionPart { list.add($f.list); } )*
;
functionBegin returns [java.util.List<Object> list]:
m=NUMBER v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); list.add($e.value); }
| m=NUMBER v=VARIABLE { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); }
| v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add("+"); list.add("1"); list.add($v.text); list.add($e.value); }
| v=VARIABLE { list = new java.util.ArrayList(); list.add("+"); list.add("1"); list.add($v.text); }
| m=NUMBER { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); }
;
functionPart returns [java.util.List<Object> list] :
s=SIGN m=NUMBER v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); list.add($e.value); }
| s=SIGN m=NUMBER v=VARIABLE { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); }
| s=SIGN v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add($s.text); list.add("1"); list.add($v.text); list.add($e.value); }
| s=SIGN v=VARIABLE { list = new java.util.ArrayList(); list.add($s.text); list.add("1"); list.add($v.text); }
| s=SIGN m=NUMBER { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); }
;
exponent returns [int value]: ('^' n=INTEGER) { $value = 1; if ( $n != null && $n.text.length() > 0) $value = Integer.parseInt($n.text); }
;
VARIABLE : ('a'..'z'|'A'..'Z')+
;
INTEGER : ('0'..'9')+
;
NUMBER : ('0'..'9')+ (','('0'..'9')+)?
;
SIGN : ('+'|'-')
;
WS : (' ' | '\t' | '\r'| '\n')+ {skip();}
;
此语法(如果在Java中编译和使用)确实接受大多数输入值.显然,并非所有有效输入值都被接受.一旦出现不使用逗号的数字,就像输入一样
This grammar, if compiled and used in Java does accept most input values. Apparently, not all valid input values are accepted. As soon as a number not using a comma pops up, like the inputs
+42; 42; 42X^1337;
抛出错误(来自输入"+42"的错误):
an error is thrown (error from input "+42"):
line 1:1 no viable alternative at input '+'
如果我将输入修改为,则不会引发错误
The error is not thrown if I modify the inputs to
+42,0; 42,0; 42,0X^1337
任何人都可以说,为什么以及如何解决?
Can anyone say, why and how to fix it?
推荐答案
具有最长匹配项的第一个词法分析器规则获胜,因此42
是INTEGER
,而NUMBER
实际上仅在逗号部分为存在,即NUMBER
的匹配时间长于INTEGER
的时间.
The first lexer rule with the longest match wins, thus 42
is an INTEGER
, and NUMBER
in fact only matches when the comma part is present, i.e. when NUMBER
has a longer match than INTEGER
.
这可以通过添加解析器规则来解决
This can be fixed by adding a parser rule
number : NUMBER | INTEGER ;
,并使用它代替其他解析器规则中的NUMBER
.
and using that instead of NUMBER
from other parser rules.
这篇关于用Java生成的Antlr解析器:并非所有输入都被读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!