ANTLR4 词法分析器规则无法按预期工作 [英] ANTLR4 lexer rules don't work as expected
问题描述
我想写一个关于月份和年份的词法规则,规则是(用正则表达式):
I want to write a lexer rule about the month and the year, the rule is(with regular expression):
"hello"[0-9]{1,2}"ever"([0-9]{2}([0-9]{2})?)?
hello"和ever"字面量仅用于调试.
the "hello" and "ever" literals are just for debuging.
也就是说,月份为一位或两位数字,年份为两位或四位数字.更重要的是,可以绕过年份部分.
that's say, one or two digits for month, and two or four digits for year. And what's more, the year part could be bypass.
例如:2015 年 8 月 ->hello08ever2015 或 hello8ever2015 或 hello8ever15 或 hello8ever 或 hello08ever;2015 年 10 月 -> hello10ever2015 或 hello10ever15 或 hello10ever;
such as: Aug 2015 ->hello08ever2015 or hello8ever2015 or hello8ever15 or hello8ever or hello08ever; Oct 2015 -> hello10ever2015 or hello10ever15 or hello10ever;
我的词法规则如下(ANTLR4):
and my lexer rules are as follow(ANTLR4):
grammar Hello;
r : 'hello' TimeDate 'ever' TimeYear? ;
TimeDate : Digit Digit?;
TimeYear : TwoDigit TwoDigit?;
TwoDigit : Digit Digit;
Digit : [0-9] ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
但是好像不行.以下是我测试的一些日志:
But it seems not working. Here're some logs for my testing:
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever2014
^Z
(r hello 20 ever 2014)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever20
^Z
(r hello 2 ever)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever14
^Z
(r hello 20 ever)
C:\antlr\workspace\demo>grun Hello r -tree -gui
C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever2014
^Z
(r hello 2 ever 2014)
对于输入:hello2ever20,无法识别年份部分'20';对于输入:hello20ever14,它无法识别年份部分'14';
for input: hello2ever20, it can't identify the year part '20'; for input: hello20ever14, it can't identify the year part '14';
任何人都可以帮忙解决这个问题???
Anyone could help on this???
谢谢!!
推荐答案
您必须意识到 ANTLR 的词法分析器规则是根据它们在语法文件中的位置进行匹配的.词法分析器不会监听"解析器在解析器规则中的某个位置可能需要什么.词法分析器尝试匹配尽可能多的字符,当 2 个(或更多)规则匹配相同数量的字符时,首先定义的规则将获胜.
You must realise that ANTLR's lexer rules are matched according their position in the grammar file. The lexer does not "listen" what the parser might need at a certain position in a parser rule. The lexer tries to match as much characters as possible, and when 2 (or more) rules match the same amount of characters, the rule defined first will win.
在您的情况下,这意味着 15
将始终被标记为 TimeDate
而永远不会被标记为 TimeYear
因为这两个规则都匹配 15
但首先定义 TimeDate
.2015
将被标记为 TimeYear
,因为没有其他规则匹配 4 位数字.
In your case that means that 15
will always be tokenized as a TimeDate
and never as a TimeYear
because both rules match 15
but TimeDate
is defined first. 2015
will be tokenized as a TimeYear
because no other rule matches 4 digits.
解决方案是将 TimeYear
更改为解析器规则:
A solution would be to change TimeYear
into a parser rule:
timeYear
: TimeDate TimeDate?
;
这篇关于ANTLR4 词法分析器规则无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!