ANTLR4词法分析器规则无法按预期工作 [英] ANTLR4 lexer rules don't work as expected

查看:234
本文介绍了ANTLR4词法分析器规则无法按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个关于月份和年份的词法分析器规则,该规则是(带有正则表达式):

I want to write a lexer rule about the month and the year, the rule is(with regular expression):

"hello"[0-9]{1,2}"ever"([0-9]{2}([0-9]{2})?)?

"hello"和"ever"字面量仅用于调试.

the "hello" and "ever" literals are just for debuging.

也就是说,月份为一或两位数字,年份为两或四位数.而且,年份部分可以绕开.

that's say, one or two digits for month, and two or four digits for year. And what's more, the year part could be bypass.

例如: 2015年8月-> hello08ever2015或hello8ever2015或hello8ever15或hello8ever或hello08ever; 2015年10月-> hello10ever2015或hello10ever15或hello10ever;

such as: Aug 2015 ->hello08ever2015 or hello8ever2015 or hello8ever15 or hello8ever or hello08ever; Oct 2015 -> hello10ever2015 or hello10ever15 or hello10ever;

和我的词法分析器规则如下(ANTLR4):

and my lexer rules are as follow(ANTLR4):

grammar Hello;
r  : 'hello' TimeDate 'ever' TimeYear? ;        

TimeDate : Digit Digit?;

TimeYear : TwoDigit TwoDigit?;

TwoDigit : Digit Digit;

Digit : [0-9] ;             

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

但是它似乎不起作用. 这是我测试的一些日志:

But it seems not working. Here're some logs for my testing:

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever2014
^Z
(r hello 20 ever 2014)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever20
^Z
(r hello 2 ever)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello20ever14
^Z
(r hello 20 ever)

C:\antlr\workspace\demo>grun Hello r -tree -gui

C:\antlr\workspace\demo>java org.antlr.v4.runtime.misc.TestRig Hello r -tree -gui
hello2ever2014
^Z
(r hello 2 ever 2014)

用于输入:hello2ever20,它无法识别年份部分"20"; 输入:hello20ever14,无法识别年份部分"14";

for input: hello2ever20, it can't identify the year part '20'; for input: hello20ever14, it can't identify the year part '14';

任何人都可以为此提供帮助?

Anyone could help on this???

谢谢!

推荐答案

您必须意识到ANTLR的词法分析器规则是根据它们在语法文件中的位置来匹配的.词法分析器不会侦听"解析器规则中某个位置解析器可能需要的内容.该词法分析器尝试匹配尽可能多的字符,并且当2个(或更多)规则匹配相同数量的字符时,首先定义的规则将获胜.

You must realise that ANTLR's lexer rules are matched according their position in the grammar file. The lexer does not "listen" what the parser might need at a certain position in a parser rule. The lexer tries to match as much characters as possible, and when 2 (or more) rules match the same amount of characters, the rule defined first will win.

在您的情况下,这意味着15将始终被标记为TimeDate,而永远不会被标记为TimeYear,因为两个规则都匹配15,但首先定义了TimeDate. 2015将被标记为TimeYear,因为没有其他规则匹配4位数字.

In your case that means that 15 will always be tokenized as a TimeDate and never as a TimeYear because both rules match 15 but TimeDate is defined first. 2015 will be tokenized as a TimeYear because no other rule matches 4 digits.

一种解决方案是将TimeYear更改为解析器规则:

A solution would be to change TimeYear into a parser rule:

timeYear
 : TimeDate TimeDate?
 ;

这篇关于ANTLR4词法分析器规则无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆