ANTLR中的浮点文字和范围参数 [英] Float literal and range parameter in ANTLR
问题描述
我正在使用语言D的解析器,尝试添加切片"运算符规则时遇到麻烦.您可以在
I'm working on a parser for the language D and I ran in to trouble when I tried to add the "slice" operator rule. You can find the ANTLR grammar for it here. Basically the problem is that if the lexer encounters a string like this: "1..2" it gets completely lost, and it ends up being as a single float value and therefore the postfixExpression rule for a string like "a[10..11]" ends up being a ExpArrIndex object with a ExpLiteralReal argument. Can somebody explain what is exactly wrong with the numeric literals? (as far as I understand it fails somewhere around these tokens)
推荐答案
当在浮动规则中遇到".."
时,可以通过发出两个令牌(Int
和Range
令牌)来做到这一点.您需要在词法分析器中重写两个方法才能实现这一点.
You can do that by emitting two tokens (an Int
and Range
token) when you encounter a ".."
inside a float rule. You need to override two methods in your lexer to accomplish this.
一个演示,该演示只包含Dee
语法的一小部分:
A demo with a small part of your Dee
grammar:
grammar Dee;
@lexer::members {
java.util.Queue<Token> tokens = new java.util.LinkedList<Token>();
public void offer(int ttype, String ttext) {
this.emit(new CommonToken(ttype, ttext));
}
@Override
public void emit(Token t) {
state.token = t;
tokens.offer(t);
}
@Override
public Token nextToken() {
super.nextToken();
return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll();
}
}
parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;
Range
: '..'
;
IntegerLiteral
: Integer IntSuffix?
;
FloatLiteral
: Float ImaginarySuffix?
;
// skipping
Space
: ' ' {skip();}
;
// fragments
fragment Float
: d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix
| '..' {offer(IntegerLiteral, $d.text); offer(Range, "..");}
| '.' DecimalDigits DecimalExponent?
)
| '.' DecimalDigits DecimalExponent?
;
fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits;
fragment DecimalDigits : ('0'..'9'|'_')+ ;
fragment FloatTypeSuffix : 'f' | 'F' | 'L';
fragment ImaginarySuffix : 'i';
fragment IntSuffix : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ;
fragment Integer : Decimal| Binary| Octal| Hexadecimal ;
fragment Decimal : '0' | '1'..'9' (DecimalDigit | '_')* ;
fragment Binary : ('0b' | '0B') ('0' | '1' | '_')+ ;
fragment Octal : '0' (OctalDigit | '_')+ ;
fragment Hexadecimal : ('0x' | '0X') (HexDigit | '_')+;
fragment DecimalDigit : '0'..'9' ;
fragment OctalDigit : '0'..'7' ;
fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
使用以下类测试语法:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
DeeLexer lexer = new DeeLexer(new ANTLRStringStream("1..2 .. 33.33 ..21.0"));
DeeParser parser = new DeeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
当您运行Main
时,将产生以下输出:
And when you run Main
, the following output is produced:
IntegerLiteral '1'
Range '..'
IntegerLiteral '2'
Range '..'
FloatLiteral '33.33'
Range '..'
FloatLiteral '21.0'
编辑
是的,正如您在评论中指出的那样,词法分析器规则只能发出1个单个标记. 但是,正如您自己已经尝试过的那样,语义谓词确实可以用来迫使词法分析器在字符流中向前看,以确保在尝试之前,在IntegerLiteral
标记之后实际上存在一个".."
匹配FloatLiteral
.
EDIT
Yeah, as you indicated in the comments, a lexer rule can only emit 1 single token. But, as you yourself already tried, semantic predicates can indeed be used to force the lexer to look ahead in the char-stream to ensure there is actually a ".."
after an IntegerLiteral
token before trying to match a FloatLiteral
.
以下语法将产生与第一个演示相同的标记.
The following grammar would produce the same tokens as the first demo.
grammar Dee;
parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;
Range
: '..'
;
Number
: (IntegerLiteral Range)=> IntegerLiteral {$type=IntegerLiteral;}
| (FloatLiteral)=> FloatLiteral {$type=FloatLiteral;}
| IntegerLiteral {$type=IntegerLiteral;}
;
// skipping
Space
: ' ' {skip();}
;
// fragments
fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits;
fragment DecimalDigits : ('0'..'9'|'_')+ ;
fragment FloatLiteral : Float ImaginarySuffix?;
fragment IntegerLiteral : Integer IntSuffix?;
fragment FloatTypeSuffix : 'f' | 'F' | 'L';
fragment ImaginarySuffix : 'i';
fragment IntSuffix : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ;
fragment Integer : Decimal| Binary| Octal| Hexadecimal ;
fragment Decimal : '0' | '1'..'9' (DecimalDigit | '_')* ;
fragment Binary : ('0b' | '0B') ('0' | '1' | '_')+ ;
fragment Octal : '0' (OctalDigit | '_')+ ;
fragment Hexadecimal : ('0x' | '0X') (HexDigit | '_')+;
fragment DecimalDigit : '0'..'9' ;
fragment OctalDigit : '0'..'7' ;
fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment Float
: d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix
| '.' DecimalDigits DecimalExponent?
)
| '.' DecimalDigits DecimalExponent?
;
这篇关于ANTLR中的浮点文字和范围参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!