自定义ANTLR语法不适用于每个输入 [英] Custom ANTLR grammar not working for every input
问题描述
我正在尝试为我们的自定义规则引擎编写语法,该引擎使用ANTLR(用于解析)和Pentaho Kettle(用于执行规则)
I am trying to write a grammar for our custom rule engine which uses ANTLR (for parsing) and Pentaho Kettle (for executing the rules)
解析器的有效输入应为以下类型:
(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
即PERSON_PHONE = 123456789
Valid inputs for the parser would be of the type:
(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
i.e. PERSON_PHONE = 123456789
这是我的语法:
grammar RuleGrammar;
options{
language=Java;
}
prog : condition;
condition
: LHSOPERAND RELATIONOPERATOR RHSOPERAND
;
LHSOPERAND
: STRINGVALUE
;
RHSOPERAND
: NUMBERVALUE |
STRINGVALUE
;
RELATIONOPERATOR
: '>' |
'=>' |
'<' |
'<=' |
'=' |
'<>'
;
fragment NUMBERVALUE
: '0'..'9'+
;
fragment STRINGVALUE
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_')*
;
fragment LOGICALOPERATOR
: 'AND' |
'OR' |
'NOT'
;
我所面临的问题是与字符串值进行比较,即PERSON_NAME = 1将通过语法,但是值PERSON_NAME=BATMAN
不起作用.我正在使用ANTLRWorks,并在调试PERSON_NAME=BATMAN
时,得到RHS值的MismatchTokenException
.
The issue I am facing is comparing against string value i.e. PERSON_NAME=1 would pass the grammar, but the value PERSON_NAME=BATMAN
does not work. I am using ANTLRWorks and on debugging for PERSON_NAME=BATMAN
, I get a MismatchTokenException
for the RHS value.
有人可以指导我哪里出问题了吗?
Can anyone please guide me where I am going wrong?
推荐答案
BATMAN
被标记为LHSOPERAND
标记.您必须意识到,词法分析器没有考虑解析器在特定时间的需求".词法分析器只是尝试尽可能地匹配,并且如果2个(或更多)规则匹配相同数量的字符(在您的情况下为LHSOPERAND
和RHSOPERAND
),则首先定义的规则将获胜",即LHSOPERAND
规则.
BATMAN
is being tokenized as a LHSOPERAND
token. You must realize that the lexer does not take into account what the parser "needs" at a particular time. The lexer simply tries to match as much as possible, and in case 2 (or more) rules match the same amount of characters (LHSOPERAND
and RHSOPERAND
in your case), the rule defined first will "win", which is the LHSOPERAND
rule.
编辑
这样看:首先,词法分析器接收字符流,然后将其转换为令牌流.创建完所有令牌后,解析器将接收这些令牌,然后尝试对其进行解释.在解析过程中(在解析器规则中)但不是在解析过程中 not 创建令牌.
EDIT
Look at it like this: first the lexer receives the character stream which it converts in a stream of tokens. After all tokens have been created, the parser receives these tokens which it then tries to make sense of. Tokens are not created during parsing (in parser rules), but before it.
如何快速完成演示:
A quick demo of how you could do it:
grammar RuleGrammar;
prog
: condition EOF
;
condition
: logical
;
logical
: relational ((AND | OR) relational)*
;
relational
: STRINGVALUE ((GT | GTEQ | LT | LTEQ | EQ | NEQ) term)?
;
term
: STRINGVALUE
| NUMBERVALUE
| '(' condition ')'
;
GT : '>';
GTEQ : '>=';
LT : '<';
LTEQ : '<=';
EQ : '=';
NEQ : '<>';
NUMBERVALUE : '0'..'9'+;
AND : 'AND';
OR : 'OR';
STRINGVALUE : ('a'..'z' | 'A'..'Z' | '_')+;
SPACE : ' ' {skip();};
(请注意EQ
和NEQ
并不是真正的关系运算符...)
(note that EQ
and NEQ
aren't really relational operators...)
解析输入,例如:
PERSON_NAME = BATMAN OR age <> 42
现在将导致以下分析:
这篇关于自定义ANTLR语法不适用于每个输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!