ANTLR4 词法分析器不解决语法顺序中的歧义 [英] ANTLR4 lexer not resolving ambiguity in grammar order

查看:32
本文介绍了ANTLR4 词法分析器不解决语法顺序中的歧义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 ANTLR 4.2,我正在尝试对这个测试数据进行非常简单的解析:

Using ANTLR 4.2, I'm trying a very simple parse of this test data:

RRV0#ABC

使用最少的语法:

grammar Tiny;

thing : RRV N HASH ID ;

RRV : 'RRV' ;
N : [0-9]+ ;
HASH : '#' ;
ID : [a-zA-Z0-9]+ ;
WS : [\t\r\n]+ -> skip ; // match 1-or-more whitespace but discard

我希望词法分析器 RRV 在 ID 之前匹配,基于以下来自 Terence Parr 的 Definitive ANTLR 4 参考的摘录:

I expect the lexer RRV to match before ID, based on the excerpt below from Terence Parr's Definitive ANTLR 4 reference:

BEGIN : 'begin' ; // match b-e-g-i-n sequence; ambiguity resolves to BEGIN
ID : [a-z]+ ; // match one or more of any lowercase letter

使用上面的测试数据运行ANTLR4测试台,输出为

Running the ANTLR4 test rig with the test data above, the output is

[@0,0:3='RRV0',<4>,1:0]
[@1,4:4='#',<3>,1:4]
[@2,5:7='ABC',<4>,1:5]
[@3,10:9='<EOF>',<-1>,2:0]
line 1:0 mismatched input 'RRV0' expecting 'RRV'

我可以看到 ID 的第一个标记是 <4>,值为 'RRV0'

I can see the first token is <4> for ID, with the value 'RRV0'

我尝试重新排列词法分析器项目顺序.我还尝试通过在语法规则中显式匹配(而不是通过显式词法分析器项目)来使用隐式词法分析器项目.我也尝试不贪婪地进行匹配.这些对我来说并不成功.

I have tried rearranging the lexer item order. I have also tried using implicit lexer items by explicitly matching in the grammar rule (rather than through an explicit lexer item). I tried making matches non greedy too. Those were not successful for me.

如果我将词法 ID 项更改为不匹配大写,则 RRV 项确实匹配并且解析将进一步进行.

If I change the lexed ID item to not match upper case then the RRV item does match and the parse will get further.

我从 ANTLR 4.1 开始遇到同样的问题.

I started in ANTLR 4.1 with the same issue.

我检查了 ANTLRWorks 并从命令行检查,两种方式的结果都相同.

I checked in ANTLRWorks and from the command line, with the same result both ways.

如何更改语法以匹配词法分析器项目 RRV 而不是 ID ?

How can I change the grammar to match lexer item RRV in preference to ID ?

推荐答案

语法顺序解析策略仅在两个不同的词法分析器规则匹配相同长度的标记时适用.当长度不同时,最长的总是获胜.在您的情况下,ID 规则匹配长度为 4 的标记,该标记比仅匹配 3 个字符的 RRV 标记长.

The grammar order resolution policy only applies when two different lexer rules match the same length of token. When the length differs, the longest one always wins. In your case, the ID rule matches a token with length 4, which is longer than the RRV token that only matches 3 characters.

这种策略在 Java 等语言中尤为重要.考虑以下输入:

This strategy is especially important in languages like Java. Consider the following input:

String className = "";

连同以下两条语法规则(稍微简化):

Along with the following two grammar rules (slightly simplified):

CLASS : 'class';
ID : [a-zA-Z_] [a-zA-Z0-9_]*;

如果我们只考虑语法顺序,那么输入的className 会产生一个关键字,后跟标识符Name.重新排列规则并不能解决问题,因为这样就无法创建 CLASS 令牌,即使是输入 class.

If we only considered grammar order, then the input className would produce a keyword followed by the identifier Name. Rearranging the rules wouldn't solve the problem because then there would be no way to ever create a CLASS token, even for the input class.

这篇关于ANTLR4 词法分析器不解决语法顺序中的歧义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆