即使不匹配,ANTLR 词法分析器规则也会消耗字符? [英] ANTLR lexer rule consumes characters even if not matched?
问题描述
我有一个 antlr 词法分析器规则的奇怪副作用,我创建了一个(几乎)最小的工作示例来演示它.在这个例子中,我想匹配字符串 [0..1]
例如.但是当我调试语法时,到达解析器的令牌流只包含 [..1]
.第一个整数,无论它包含多少位数字,总是被消耗掉,我不知道它是如何发生的.如果我删除 FLOAT
规则,一切都很好,所以我猜错误就在该规则的某个地方.但是因为它根本不应该匹配 [0..1]
中的任何内容,所以我很困惑.
I've got a strange side effect of an antlr lexer rule and I've created an (almost) minimal working example to demonstrate it.
In this example I want to match the String [0..1]
for example. But when I debug the grammar the token stream that reaches the parser only contains [..1]
. The first integer, no matter how many digits it contains is always consumed and I've got no clue as to how that happens. If I remove the FLOAT
rule everything is fine so I guess the mistake lies somewhere in that rule. But since it shouldn't match anything in [0..1]
at all I'm quite puzzled.
如果我有任何可能出错的提示,我会很高兴.这是我的例子:
I'd be happy for any pointers where I might have gone wrong. This is my example:
grammar min;
options{
language = Java;
output = AST;
ASTLabelType=CommonTree;
backtrack = true;
}
tokens {
DECLARATION;
}
declaration : LBRACEVAR a=INTEGER DDOTS b=INTEGER RBRACEVAR -> ^(DECLARATION $a $b);
EXP : 'e' | 'E';
LBRACEVAR: '[';
RBRACEVAR: ']';
DOT: '.';
DDOTS: '..';
FLOAT
: INTEGER DOT POS_INTEGER
| INTEGER DOT POS_INTEGER EXP INTEGER
| INTEGER EXP INTEGER
;
INTEGER : POS_INTEGER | NEG_INTEGER;
fragment NEG_INTEGER : ('-') POS_INTEGER;
fragment POS_INTEGER : NUMBER+;
fragment NUMBER: ('0'..'9');
推荐答案
'0'
被词法分析器丢弃并产生以下错误:
The '0'
is discarded by the lexer and the following errors are produced:
line 1:3 no viable alternative at character '.'
line 1:2 extraneous input '..' expecting INTEGER
这是因为当词法分析器遇到 '0.'
时,它会尝试创建一个 FLOAT
令牌,但不能.由于没有其他规则可以依赖于匹配 '0.'
,它会产生错误,丢弃 '0'
并创建一个 DOT代码>令牌.
This is because when the lexer encounters '0.'
, it tries to create a FLOAT
token, but can't. And since there is no other rule to fall back on to match '0.'
, it produces the errors, discards '0'
and creates a DOT
token.
这就是 ANTLR 词法分析器的工作原理:它不会回溯以匹配 INTEGER
后跟一个 DDOTS
(注意 backtrack=true
仅适用于解析器规则!).
This is simply how ANTLR's lexer works: it will not backtrack to match an INTEGER
followed by a DDOTS
(note that backtrack=true
only applies to parser rules!).
在 FLOAT
规则中,您必须确保当双 '.'
在前面时,您生成一个 INTEGER
标记.您可以通过添加句法谓词(('..')=>
部分)并仅在单个 '.' 时生成
后跟一个数字(FLOAT
标记来实现这一点.('.' DIGIT)=>
部分).请参阅以下演示:
Inside the FLOAT
rule, you must make sure that when a double '.'
is ahead, you produce a INTEGER
token instead. You can do that by adding a syntactic predicate (the ('..')=>
part) and produce FLOAT
tokens only when a single '.'
is followed by a digit (the ('.' DIGIT)=>
part). See the following demo:
declaration
: LBRACEVAR INTEGER DDOTS INTEGER RBRACEVAR
;
LBRACEVAR : '[';
RBRACEVAR : ']';
DOT : '.';
DDOTS : '..';
INTEGER
: DIGIT+
;
FLOAT
: DIGIT+ ( ('.' DIGIT)=> '.' DIGIT+ EXP?
| ('..')=> {$type=INTEGER;} // change the token here
| EXP
)
;
fragment EXP : ('e' | 'E') DIGIT+;
fragment DIGIT : ('0'..'9');
这篇关于即使不匹配,ANTLR 词法分析器规则也会消耗字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!