ANTLR4 词法分析器规则无法按预期工作 [英] ANTLR4 lexer rules don't work as expected

查看:31
本文介绍了ANTLR4 词法分析器规则无法按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个关于月份和年份的词法规则,规则是(用正则表达式):

I want to write a lexer rule about the month and the year, the rule is(with regular expression):

"hello"[0-9]{1,2}"ever"([0-9]{2}([0-9]{2})?)?

hello"和ever"字面量仅用于调试.

the "hello" and "ever" literals are just for debuging.

也就是说,月份为一位或两位数字,年份为两位或四位数字.更重要的是,可以绕过年份部分.

that's say, one or two digits for month, and two or four digits for year. And what's more, the year part could be bypass.

例如:2015 年 8 月 ->hello08ever2015 或 hello8ever2015 或 hello8ever15 或 hello8ever 或 hello08ever;2015 年 10 月 -> hello10ever2015 或 hello10ever15 或 hello10ever;

such as: Aug 2015 ->hello08ever2015 or hello8ever2015 or hello8ever15 or hello8ever or hello08ever; Oct 2015 -> hello10ever2015 or hello10ever15 or hello10ever;

我的词法规则如下(ANTLR4):

and my lexer rules are as follow(ANTLR4):

grammar Hello;
r  : 'hello' TimeDate 'ever' TimeYear? ;        

TimeDate : Digit Digit?;

TimeYear : TwoDigit TwoDigit?;

TwoDigit : Digit Digit;

Digit : [0-9] ;             

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

但是好像不行.以下是我测试的一些日志:

But it seems not working. Here're some logs for my testing:

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever2014
^Z
(r hello 20 ever 2014)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever20
^Z
(r hello 2 ever)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever14
^Z
(r hello 20 ever)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever2014
^Z
(r hello 2 ever 2014)

对于输入:hello2ever20,无法识别年份部分'20';对于输入:hello20ever14,它无法识别年份部分'14';

for input: hello2ever20, it can't identify the year part '20'; for input: hello20ever14, it can't identify the year part '14';

任何人都可以帮忙解决这个问题???

Anyone could help on this???

谢谢!!

推荐答案

您必须意识到 ANTLR 的词法分析器规则是根据它们在语法文件中的位置进行匹配的.词法分析器不会监听"解析器在解析器规则中的某个位置可能需要什么.词法分析器尝试匹配尽可能多的字符,当 2 个(或更多)规则匹配相同数量的字符时,首先定义的规则将获胜.

You must realise that ANTLR's lexer rules are matched according their position in the grammar file. The lexer does not "listen" what the parser might need at a certain position in a parser rule. The lexer tries to match as much characters as possible, and when 2 (or more) rules match the same amount of characters, the rule defined first will win.

在您的情况下,这意味着 15 将始终被标记为 TimeDate 而永远不会被标记为 TimeYear 因为这两个规则都匹配 15 但首先定义 TimeDate.2015 将被标记为 TimeYear,因为没有其他规则匹配 4 位数字.

In your case that means that 15 will always be tokenized as a TimeDate and never as a TimeYear because both rules match 15 but TimeDate is defined first. 2015 will be tokenized as a TimeYear because no other rule matches 4 digits.

解决方案是将 TimeYear 更改为解析器规则:

A solution would be to change TimeYear into a parser rule:

timeYear
 : TimeDate TimeDate?
 ;

这篇关于ANTLR4 词法分析器规则无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆