ANTLR4词法分析器规则无法按预期工作 [英] ANTLR4 lexer rules don't work as expected
问题描述
我想写一个关于月份和年份的词法分析器规则,该规则是(带有正则表达式):
I want to write a lexer rule about the month and the year, the rule is(with regular expression):
"hello"[0-9]{1,2}"ever"([0-9]{2}([0-9]{2})?)?
"hello"和"ever"字面量仅用于调试.
the "hello" and "ever" literals are just for debuging.
也就是说,月份为一或两位数字,年份为两或四位数.而且,年份部分可以绕开.
that's say, one or two digits for month, and two or four digits for year. And what's more, the year part could be bypass.
例如: 2015年8月-> hello08ever2015或hello8ever2015或hello8ever15或hello8ever或hello08ever; 2015年10月-> hello10ever2015或hello10ever15或hello10ever;
such as: Aug 2015 ->hello08ever2015 or hello8ever2015 or hello8ever15 or hello8ever or hello08ever; Oct 2015 -> hello10ever2015 or hello10ever15 or hello10ever;
和我的词法分析器规则如下(ANTLR4):
and my lexer rules are as follow(ANTLR4):
grammar Hello;
r : 'hello' TimeDate 'ever' TimeYear? ;
TimeDate : Digit Digit?;
TimeYear : TwoDigit TwoDigit?;
TwoDigit : Digit Digit;
Digit : [0-9] ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
但是它似乎不起作用. 这是我测试的一些日志:
But it seems not working. Here're some logs for my testing:
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever2014
^Z
(r hello 20 ever 2014)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever20
^Z
(r hello 2 ever)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever14
^Z
(r hello 20 ever)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever2014
^Z
(r hello 2 ever 2014)
用于输入:hello2ever20,它无法识别年份部分"20"; 输入:hello20ever14,无法识别年份部分"14";
for input: hello2ever20, it can't identify the year part '20'; for input: hello20ever14, it can't identify the year part '14';
任何人都可以为此提供帮助?
Anyone could help on this???
谢谢!
推荐答案
您必须意识到ANTLR的词法分析器规则是根据它们在语法文件中的位置来匹配的.词法分析器不会侦听"解析器规则中某个位置解析器可能需要的内容.该词法分析器尝试匹配尽可能多的字符,并且当2个(或更多)规则匹配相同数量的字符时,首先定义的规则将获胜.
You must realise that ANTLR's lexer rules are matched according their position in the grammar file. The lexer does not "listen" what the parser might need at a certain position in a parser rule. The lexer tries to match as much characters as possible, and when 2 (or more) rules match the same amount of characters, the rule defined first will win.
在您的情况下,这意味着15
将始终被标记为TimeDate
,而永远不会被标记为TimeYear
,因为两个规则都匹配15
,但首先定义了TimeDate
. 2015
将被标记为TimeYear
,因为没有其他规则匹配4位数字.
In your case that means that 15
will always be tokenized as a TimeDate
and never as a TimeYear
because both rules match 15
but TimeDate
is defined first. 2015
will be tokenized as a TimeYear
because no other rule matches 4 digits.
一种解决方案是将TimeYear
更改为解析器规则:
A solution would be to change TimeYear
into a parser rule:
timeYear
: TimeDate TimeDate?
;
这篇关于ANTLR4词法分析器规则无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!