ANTLR4模棱两可的语法 [英] ANTLR4 ambiguous grammar

查看:157
本文介绍了ANTLR4模棱两可的语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要实现以下行为:User:class应该被解析为Object - User; Type - classUs:er:class也应该得到Object - Us:er; Type - class.我不能使第二部分起作用,一旦我添加:作为WORD的合法符号,它就会将整个输入解析为对象Object - Us:er:class. 我的语法:

I want to achieve following behavior: User:class should be parsed to Object - User; Type - class, alsoUs:er:class should result Object - Us:er; Type - class. I can't make second part work, as soon as I add : as a legal symbol for WORD it parses whole input as an object Object - Us:er:class. My grammar:

grammar Sketch;

/*
 * Parser Rules
 */
input               : (object)+ EOF ;
object              : objectName objectType? NEWLINE ;
objectType          : ':' TYPE ;
objectName          : WORD ;

/*
 * Lexer Rules
 */ 
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
fragment NUMBER     : [0-9] ;
fragment WHITESPACE : (' ') ;
fragment SYMBOLS    : [!-/:-@[-`] ;
fragment C          : [cC] ;
fragment L          : [lL] ;
fragment A          : [aA] ;
fragment S          : [sS] ;
fragment T          : [tT] ;
fragment U          : [uU] ;
fragment R          : [rR] ;

TYPE                : ((C L A S S) | (S T R U C T));

NEWLINE             : ('\r'? '\n' | '\r')+ ;

WORD                : (LOWERCASE | UPPERCASE | NUMBER | WHITESPACE | SYMBOLS)+ ;

每个字母的片段用于不区分大小写的解析. 据我了解,词法分析器从上到下优先考虑规则,因此TYPE应该优先于WORD,但我无法实现. 我是antlr4的新手,也许我错过了明显的东西.

Fragments for each letter are for case-insensitive parsing. As I understand, lexer gives priority to rules top-to-bottom, so TYPE should be picked over WORD, but I can't achieve it. I'm new to antlr4, maybe I'm missing something obvious.

推荐答案

如果您只需要解析这么简单的内容,则无需使用ANTLR编写解析器.这是我建议仅使用简单正则表达式的极少数情况之一. 如果您想用ANTLR解决它,我会这样做: 1)丑陋的解决方案:您尝试使用谓词或动作来欺骗&强迫解析做你想做的事 2)您只需定义两个标记:一个用于获取标识符,另一个用于获取分号.然后,稍后使用解析器在代码中进行合成.

If you just need to parse something so simple you do not need to write a parser with ANTLR. It is one of the very few cases where I would suggest just using a simple regex. If you want to solve it with ANTLR I would do it like this: 1) Ugly solution: you try to use predicates or actions to trick & force the parsing to do what you want 2) You simply define two tokens: one to get identifiers and one to get the semicolon. Then you do the composition later, in the code using your parser.

例如,对于User:class,您将获得[[ID:"User"],[ID:"class"]] 而对于Us:er:class,您将获得[[ID:"Us"],[ID:"er"],[ID:"class"]] 那么您编码后就知道最后一个ID代表类型,所有其他ID的顺序代表对象.

For example, for User:class you would get [[ID:"User"], [ID:"class"]] while for Us:er:class you would get [[ID:"Us"], [ID:"er"], [ID:"class"]] then you code you know that the last ID represent the type and the sequence of all the other IDs represent the object.

都不是很好的解决方案,但我认为ANTLR不是您要尝试做的正确工具.

Neither are not great solutions but I think ANTLR is not the right tool for what you are trying to do.

这篇关于ANTLR4模棱两可的语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆