ANTLR4 歧义语法 [英] ANTLR4 ambiguous grammar

查看:40
本文介绍了ANTLR4 歧义语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想实现以下行为:User:class 应该被解析为 Object - User;类型 - 类,也Us:er:class 应该结果 Object - Us:er;类型 - 类.我无法使第二部分工作,只要我添加 : 作为 WORD 的合法符号,它就会将整个输入解析为一个对象 Object - Us:er:class.我的语法:

I want to achieve following behavior: User:class should be parsed to Object - User; Type - class, alsoUs:er:class should result Object - Us:er; Type - class. I can't make second part work, as soon as I add : as a legal symbol for WORD it parses whole input as an object Object - Us:er:class. My grammar:

grammar Sketch;

/*
 * Parser Rules
 */
input               : (object)+ EOF ;
object              : objectName objectType? NEWLINE ;
objectType          : ':' TYPE ;
objectName          : WORD ;

/*
 * Lexer Rules
 */ 
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
fragment NUMBER     : [0-9] ;
fragment WHITESPACE : (' ') ;
fragment SYMBOLS    : [!-/:-@[-`] ;
fragment C          : [cC] ;
fragment L          : [lL] ;
fragment A          : [aA] ;
fragment S          : [sS] ;
fragment T          : [tT] ;
fragment U          : [uU] ;
fragment R          : [rR] ;

TYPE                : ((C L A S S) | (S T R U C T));

NEWLINE             : ('\r'? '\n' | '\r')+ ;

WORD                : (LOWERCASE | UPPERCASE | NUMBER | WHITESPACE | SYMBOLS)+ ;

每个字母的片段用于不区分大小写的解析.据我了解,词法分析器从上到下优先考虑规则,因此应该选择 TYPE 而不是 WORD,但我无法实现.我是 antlr4 的新手,也许我遗漏了一些明显的东西.

Fragments for each letter are for case-insensitive parsing. As I understand, lexer gives priority to rules top-to-bottom, so TYPE should be picked over WORD, but I can't achieve it. I'm new to antlr4, maybe I'm missing something obvious.

推荐答案

如果您只需要解析一些如此简单的东西,您就不需要使用 ANTLR 编写解析器.这是我建议仅使用简单正则表达式的极少数情况之一.如果你想用 ANTLR 解决它,我会这样做:1)丑陋的解决方案:你试图使用谓词或动作来欺骗&强制解析做你想做的事2) 您只需定义两个标记:一个用于获取标识符,另一个用于获取分号.然后您稍后在代码中使用解析器进行组合.

If you just need to parse something so simple you do not need to write a parser with ANTLR. It is one of the very few cases where I would suggest just using a simple regex. If you want to solve it with ANTLR I would do it like this: 1) Ugly solution: you try to use predicates or actions to trick & force the parsing to do what you want 2) You simply define two tokens: one to get identifiers and one to get the semicolon. Then you do the composition later, in the code using your parser.

例如,对于 User:class 你会得到 [[ID:"User"], [ID:"class"]]而对于 Us:er:class 你会得到 [[ID:"Us"], [ID:"er"], [ID:"class"]]然后你编写代码,你知道最后一个 ID 代表类型,所有其他 ID 的顺序代表对象.

For example, for User:class you would get [[ID:"User"], [ID:"class"]] while for Us:er:class you would get [[ID:"Us"], [ID:"er"], [ID:"class"]] then you code you know that the last ID represent the type and the sequence of all the other IDs represent the object.

两者都不是很好的解决方案,但我认为 ANTLR 不是您想要做的事情的正确工具.

Neither are not great solutions but I think ANTLR is not the right tool for what you are trying to do.

这篇关于ANTLR4 歧义语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆