自定义 ANTLR 语法不适用于每个输入 [英] Custom ANTLR grammar not working for every input

查看:25
本文介绍了自定义 ANTLR 语法不适用于每个输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为使用 ANTLR(用于解析)和 Pentaho Kettle(用于执行规则)的自定义规则引擎编写语法

I am trying to write a grammar for our custom rule engine which uses ANTLR (for parsing) and Pentaho Kettle (for executing the rules)

解析器的有效输入类型为:
(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
即 PERSON_PHONE = 123456789

Valid inputs for the parser would be of the type:
(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
i.e. PERSON_PHONE = 123456789

这是我的语法:

grammar RuleGrammar;
options{
language=Java;
}

prog                : condition;

condition
                                :  LHSOPERAND RELATIONOPERATOR RHSOPERAND
                                ;

LHSOPERAND
                                :  STRINGVALUE
                                ;

RHSOPERAND
                                :  NUMBERVALUE    |
                                   STRINGVALUE
                                ;


RELATIONOPERATOR
                                :   '>'    |
                                     '=>'  |
                                     '<'   |
                                     '<='  |
                                     '='   |
                                     '<>'
                                ;

fragment NUMBERVALUE
                              : '0'..'9'+
                              ;

fragment STRINGVALUE
                              :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_')*
                              ;


fragment LOGICALOPERATOR
                              :  'AND' |
                                 'OR'  |
                                 'NOT'
                              ;

我面临的问题是与字符串值进行比较,即 PERSON_NAME=1 将通过语法,但值 PERSON_NAME=BATMAN 不起作用.我正在使用 ANTLRWorks 并在调试 PERSON_NAME=BATMAN 时,我得到 RHS 值的 MismatchTokenException.

The issue I am facing is comparing against string value i.e. PERSON_NAME=1 would pass the grammar, but the value PERSON_NAME=BATMAN does not work. I am using ANTLRWorks and on debugging for PERSON_NAME=BATMAN, I get a MismatchTokenException for the RHS value.

谁能指导我哪里出错了?

Can anyone please guide me where I am going wrong?

推荐答案

BATMAN 被标记为 LHSOPERAND 标记.您必须意识到词法分析器不会考虑解析器在特定时间需要"什么.词法分析器只是尝试尽可能多地匹配,如果 2 个(或更多)规则匹配相同数量的字符(在您的情况下为 LHSOPERANDRHSOPERAND),则首先定义的规则将获胜",这就是 LHSOPERAND 规则.

BATMAN is being tokenized as a LHSOPERAND token. You must realize that the lexer does not take into account what the parser "needs" at a particular time. The lexer simply tries to match as much as possible, and in case 2 (or more) rules match the same amount of characters (LHSOPERAND and RHSOPERAND in your case), the rule defined first will "win", which is the LHSOPERAND rule.

这样看:首先词法分析器接收字符流,然后将其转换为标记流.创建完所有标记后,解析器会接收这些标记,然后尝试理解这些标记.令牌不是在解析期间(在解析器规则中)创建,而是在解析之前创建.

EDIT

Look at it like this: first the lexer receives the character stream which it converts in a stream of tokens. After all tokens have been created, the parser receives these tokens which it then tries to make sense of. Tokens are not created during parsing (in parser rules), but before it.

关于你如何可以做到这一点的快速演示:

A quick demo of how you could do it:

grammar RuleGrammar;

prog
 : condition EOF
 ;

condition
 : logical
 ;

logical
 : relational ((AND | OR) relational)*
 ;

relational
 : STRINGVALUE ((GT | GTEQ | LT | LTEQ | EQ | NEQ) term)?
 ;

term
 : STRINGVALUE
 | NUMBERVALUE
 | '(' condition ')'
 ;

GT          : '>';
GTEQ        : '>=';
LT          : '<';
LTEQ        : '<=';
EQ          : '=';
NEQ         : '<>';
NUMBERVALUE : '0'..'9'+;
AND         : 'AND';
OR          : 'OR';
STRINGVALUE : ('a'..'z' | 'A'..'Z' | '_')+;
SPACE       : ' ' {skip();};

(注意EQNEQ 并不是真正的关系运算符...)

(note that EQ and NEQ aren't really relational operators...)

解析输入如:

PERSON_NAME = BATMAN OR age <> 42

现在将导致以下解析:

这篇关于自定义 ANTLR 语法不适用于每个输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆