即使不匹配，ANTLR 词法分析器规则也会消耗字符? [英] ANTLR lexer rule consumes characters even if not matched?

查看：21 发布时间：2021/11/11 3:43:46 antlr antlr3 antlrworks

本文介绍了即使不匹配，ANTLR 词法分析器规则也会消耗字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 antlr 词法分析器规则的奇怪副作用，我创建了一个(几乎)最小的工作示例来演示它.在这个例子中，我想匹配字符串 [0..1] 例如.但是当我调试语法时，到达解析器的令牌流只包含 [..1].第一个整数，无论它包含多少位数字，总是被消耗掉，我不知道它是如何发生的.如果我删除 FLOAT 规则，一切都很好，所以我猜错误就在该规则的某个地方.但是因为它根本不应该匹配 [0..1] 中的任何内容，所以我很困惑.

I've got a strange side effect of an antlr lexer rule and I've created an (almost) minimal working example to demonstrate it. In this example I want to match the String [0..1] for example. But when I debug the grammar the token stream that reaches the parser only contains [..1]. The first integer, no matter how many digits it contains is always consumed and I've got no clue as to how that happens. If I remove the FLOAT rule everything is fine so I guess the mistake lies somewhere in that rule. But since it shouldn't match anything in [0..1] at all I'm quite puzzled.

如果我有任何可能出错的提示，我会很高兴.这是我的例子:

I'd be happy for any pointers where I might have gone wrong. This is my example:

grammar min;
options{
language = Java;
output = AST;
ASTLabelType=CommonTree;
backtrack = true;
}
tokens {
  DECLARATION;
}

declaration : LBRACEVAR a=INTEGER DDOTS b=INTEGER RBRACEVAR -> ^(DECLARATION $a $b);

EXP : 'e' | 'E';
LBRACEVAR: '[';
RBRACEVAR: ']';
DOT: '.';
DDOTS: '..';

FLOAT
    : INTEGER DOT POS_INTEGER
    | INTEGER DOT POS_INTEGER EXP INTEGER
    | INTEGER EXP INTEGER
    ;

INTEGER : POS_INTEGER | NEG_INTEGER;
fragment NEG_INTEGER : ('-') POS_INTEGER;
fragment POS_INTEGER : NUMBER+;
fragment NUMBER: ('0'..'9');

推荐答案

'0' 被词法分析器丢弃并产生以下错误:

The '0' is discarded by the lexer and the following errors are produced:

line 1:3 no viable alternative at character '.'
line 1:2 extraneous input '..' expecting INTEGER

这是因为当词法分析器遇到 '0.' 时，它会尝试创建一个 FLOAT 令牌，但不能.由于没有其他规则可以依赖于匹配 '0.'，它会产生错误，丢弃 '0' 并创建一个 DOT令牌.


This is because when the lexer encounters '0.', it tries to create a FLOAT token, but can't. And since there is no other rule to fall back on to match '0.', it produces the errors, discards '0' and creates a DOT token. 
这就是 ANTLR 词法分析器的工作原理:它不会回溯以匹配 INTEGER 后跟一个 DDOTS(注意 backtrack=true仅适用于解析器规则！).
This is simply how ANTLR's lexer works: it will not backtrack to match an INTEGER followed by a DDOTS (note that backtrack=true only applies to parser rules!).
在 FLOAT 规则中，您必须确保当双 '.' 在前面时，您生成一个 INTEGER 标记.您可以通过添加句法谓词(('..')=> 部分)并仅在单个 '.' 时生成 FLOAT 标记来实现这一点. 后跟一个数字(('.' DIGIT)=> 部分).请参阅以下演示:
Inside the FLOAT rule, you must make sure that when a double '.' is ahead, you produce a INTEGER token instead. You can do that by adding a syntactic predicate (the ('..')=> part) and produce FLOAT tokens only when a single '.' is followed by a digit (the ('.' DIGIT)=> part). See the following demo:
declaration
 : LBRACEVAR INTEGER DDOTS INTEGER RBRACEVAR
 ;

LBRACEVAR : '[';
RBRACEVAR : ']';
DOT       : '.';
DDOTS     : '..';

INTEGER
 : DIGIT+
 ;

FLOAT
 : DIGIT+ ( ('.' DIGIT)=> '.' DIGIT+ EXP? 
          | ('..')=>      {$type=INTEGER;} // change the token here
          |               EXP
          )
 ;

fragment EXP   : ('e' | 'E') DIGIT+;
fragment DIGIT : ('0'..'9');


                        这篇关于即使不匹配，ANTLR 词法分析器规则也会消耗字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

即使不匹配，ANTLR 词法分析器规则也会消耗字符? [英] ANTLR lexer rule consumes characters even if not matched?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

即使不匹配，ANTLR 词法分析器规则也会消耗字符? [英] ANTLR lexer rule consumes characters even if not matched?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭