ANTLR 如何使用具有相同开头的词法分析器规则? [英] ANTLR How to use lexer rules having same starting?

查看:16
本文介绍了ANTLR 如何使用具有相同开头的词法分析器规则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用具有相同开头的词法分析器规则?

我正在尝试使用两个相似的词法分析器规则(具有相同的开头):

TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;整数 : ('0'..'9')+;冒号       : ':';

这是我的示例语法:

grammar TestTime;文本 : (timeexpr | cas​​eblock)*;时间表达式:时间;caseblock : INT COLON ID;时间 : ('0'..'9')+ ':' ('0'..'9')+;INT : ('0'..'9')+;冒号     : ':';ID : ('a'..'z')+;WS : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

当我尝试解析文本时:

12:44第123话123:美国广播公司

前两行被正确解析,第三行 - 产生错误.出于某种原因,'123:' ANTLR 解析为 TIME(但不是)...

那么,有没有可能用这样的词素来制作语法?

在我的语言中,使用 case-blocks 和 datetime 常量需要有这样的规则.例如,用我的语言可以这样写:

case MyInt of1: a := 2012 年 1 月 1 日;2: b:= 12:44;3:……结尾;

解决方案

一旦 DIGIT+ ':' 匹配,词法分析器希望此后跟另一个 DIGIT匹配 TIMECONSTANT.如果这没有发生,它就不能依赖另一个匹配 DIGIT+ ':' 的词法分析器规则,并且词法分析器不会放弃已经匹配的 ':' 来匹配一个 INTEGER.

一个可能的解决方案是在 INTEGER 规则的末尾有选择地匹配 ':' DIGIT+ 并在匹配时更改令牌的类型:

语法T;解析: (t=.{System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF;整数 : DIGIT+ ((':' DIGIT)=> ':' DIGIT+ {$type=TIMECONSTANT;})?;冒号        : ':';空格:' ' {skip();};片段数字:'0'..'9';片段时间常数:;

解析输入时:

11: 12:13 : 14

将打印以下内容:

INTEGER '11'冒号           ':'时间常数 '12:13'冒号           ':'整数 '14'

编辑

<块引用>

不太好,但有效...

没错.但是,这不是 ANTLR 的缺点:我知道的大多数词法分析器生成器在正确标记这样的 TIMECONSTANT 时都会遇到问题(当 INTEGERCOLON也存在).ANTLR 至少提供了一种在词法分析器中处理它的方法:)

也可以让解析器而不是词法分析器来处理:

time_const : INTEGER COLON INTEGER;整数:'0'..'9'+;冒号      : ':';空格:' ' {skip();};

但是,如果您的语言的词法分析器忽略空格,则输入如下:

12 : 34

当然也可以通过 time_const 规则匹配.

How to use lexer rules having same starting?

I am trying to use two similar lexer rules (having the same starting):

TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;
INTEGER     : ('0'..'9')+;
COLON       : ':';

Here is my sample grammar:

grammar TestTime;

text      : (timeexpr | caseblock)*;

timeexpr  : TIME;
caseblock : INT COLON ID;

TIME      : ('0'..'9')+ ':' ('0'..'9')+;
INT       : ('0'..'9')+;
COLON     : ':';
ID        : ('a'..'z')+;

WS        : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

When i try to parse text:

12:44
123 : abc
123: abc

First two lines are parsed correctly, 3rd - generates error. For some reason, '123:' ANTLR parses as TIME (while it is not)...

So, is it possible to make grammar with such lexems?

Having such rules is necessary in my language for using both case-blocks and datetime constants. For example in my language it is possible to write:

case MyInt of
  1: a := 01.01.2012;
  2: b := 12:44;
  3: ....
end;

解决方案

As soon DIGIT+ ':' is matched, the lexer expects this to be followed by another DIGIT to match a TIMECONSTANT. If this does not happen, it cannot fall back on another lexer rule that matches DIGIT+ ':' and the lexer will not give up on the already matched ':' to match an INTEGER.

A possible solution would be to optionally match ':' DIGIT+ at the end of the INTEGER rule and change the type of the token if this gets matched:

grammar T;  

parse
 : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

INTEGER      : DIGIT+ ((':' DIGIT)=> ':' DIGIT+ {$type=TIMECONSTANT;})?;
COLON        : ':';
SPACE        : ' ' {skip();};

fragment DIGIT : '0'..'9';
fragment TIMECONSTANT : ;

When parsing the input:

11: 12:13 : 14

the following will be printed:

INTEGER         '11'
COLON           ':'
TIMECONSTANT    '12:13'
COLON           ':'
INTEGER         '14'

EDIT

Not too nice, but works...

True. However, this is not an ANTLR short coming: most lexer generators I know will have a problem properly tokenizing such a TIMECONSTANT (when INTEGER and COLON are also present). ANTLR at least facilitates a way to handle it in the lexer :)

You could also let this be handled by the parser instead of the lexer:

time_const : INTEGER COLON INTEGER;
INTEGER    : '0'..'9'+;
COLON      : ':';
SPACE      : ' ' {skip();};

However, if your language's lexer ignores white spaces, then input like:

12 :    34

would also be match by the time_const rule, of course.

这篇关于ANTLR 如何使用具有相同开头的词法分析器规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆