简单的ANTLR语法有什么问题? [英] What is the wrong with the simple ANTLR grammar?

查看:79
本文介绍了简单的ANTLR语法有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写ANTLR语法来解析日志文件,但遇到了问题. 我简化了语法以重现该问题,如下所示:

I am writing an ANTLR grammar to parse a log files, and faced a problem. I have simplified my grammar to reproduce the problem as followed:

stmt1:
  '[ ' elapse ': ' stmt2
  ;

stmt2:
  '[xxx'
  ;

stmt3:
  ': [yyy'
  ;

elapse :
  FLOAT;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* 
    ;

当我使用以下字符串测试语法时:

When I used the following string to test the grammar:

[ 98.9: [xxx

我得到了错误:

E:\work\antlr\output\__Test___input.txt line 1:9 mismatched character 'x' expecting 'y'
E:\work\antlr\output\__Test___input.txt line 1:10 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:11 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:12 mismatched input '<EOF>' expecting ': '

但是,如果我删除规则'stmt3',则将接受相同的字符串.

But if I remove the ruel 'stmt3', same string would be accepted.

我不确定发生了什么事...

I am not sure what happened...

感谢您的任何建议!

利昂

感谢Bart的帮助.我试图纠正语法. 我认为,在基准方面,我必须消除所有令牌的歧义. 而且我添加了WS令牌以简化规则.

Thanks help from Bart. I have tried to correct the grammar. I think, the baseline, I have to disambiguate all tokens. And I add WS token to simplify the rule.

stmt1:
  '[' elapse ':' stmt2
  ;

stmt2:
  '[' 'xxx'
  ;

stmt3:
  ':' '[' 'yyy'
  ;

elapse :
  FLOAT;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* 
    ;

WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;   

推荐答案

ANTLR在词法分析器规则(令牌)和解析器规则之间有严格的分隔.尽管您在解析器规则中定义了一些文字,但它们仍然是标记.这意味着以下语法(实际上)与示例语法等效:

ANTLR has a strict separation between lexer rules (tokens) and parser rules. Although you defined some literals inside parser rules, they are still tokens. This means the following grammar is equivalent (in practice) to your example grammar:

stmt1  : T1 elapse T2 stmt2 ;
stmt2  : T3 ;
stmt3  : T4 ;
elapse : FLOAT;

T1     : '[ ' ;
T2     : ': ' ;
T3     : '[xxx' ;
T4     : ': [yyy' ;
FLOAT  : ('0'..'9')+ '.' ('0'..'9')* ;

现在,当词法分析器尝试根据输入"[ 98.9: [xxx"构造标记时,它会成功创建标记T1FLOAT,但是当看到 ": ["时,它将尝试创建标记T1FLOAT.构造一个T4令牌.但是,当流中的下一个字符是"x"而不是"y"时,词法分析器将尝试构造另一个以": ["开头的令牌.但是由于没有这样的令牌,所以词法分析器会发出错误:

Now, when the lexer tries to construct tokens from the input "[ 98.9: [xxx", it successfully creates the tokens T1 and FLOAT, but when it sees ": [", it tries to construct a T4 token. But when the next char in the stream is a "x" instead of a "y", the lexer tries to construct another token that starts with ": [". But since there is no such token, the lexer emit the error:

[...]字符"x"不匹配,期望为"y"

[...] mismatched character 'x' expecting 'y'

不,词法分析器不会回溯从": ["中放弃"字符"["来匹配令牌T2,也不会在字符流中向前看是否为T4令牌可以真正构造. ANTLR的LL(*)仅适用于解析器规则,不适用于lexer规则!

And no, the lexer will not backtrack to "give up" the character "[" from ": [" to match the token T2, nor will it look ahead in the char-stream to see if a T4 token can really be constructed. ANTLR's LL(*) is only applicable to parser rules, not lexer rules!

这篇关于简单的ANTLR语法有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆