自定义 ANTLR 语法不适用于每个输入 [英] Custom ANTLR grammar not working for every input
问题描述
我正在尝试为使用 ANTLR(用于解析)和 Pentaho Kettle(用于执行规则)的自定义规则引擎编写语法
I am trying to write a grammar for our custom rule engine which uses ANTLR (for parsing) and Pentaho Kettle (for executing the rules)
解析器的有效输入类型为:(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
即 PERSON_PHONE = 123456789
Valid inputs for the parser would be of the type:
(<Attribute_name> <Relational_Operator> <Value>) AND/OR (<Attribute_name> <Relational_Operator> <Value>)
i.e. PERSON_PHONE = 123456789
这是我的语法:
grammar RuleGrammar;
options{
language=Java;
}
prog : condition;
condition
: LHSOPERAND RELATIONOPERATOR RHSOPERAND
;
LHSOPERAND
: STRINGVALUE
;
RHSOPERAND
: NUMBERVALUE |
STRINGVALUE
;
RELATIONOPERATOR
: '>' |
'=>' |
'<' |
'<=' |
'=' |
'<>'
;
fragment NUMBERVALUE
: '0'..'9'+
;
fragment STRINGVALUE
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_')*
;
fragment LOGICALOPERATOR
: 'AND' |
'OR' |
'NOT'
;
我面临的问题是与字符串值进行比较,即 PERSON_NAME=1 将通过语法,但值 PERSON_NAME=BATMAN
不起作用.我正在使用 ANTLRWorks 并在调试 PERSON_NAME=BATMAN
时,我得到 RHS 值的 MismatchTokenException
.
The issue I am facing is comparing against string value i.e. PERSON_NAME=1 would pass the grammar, but the value PERSON_NAME=BATMAN
does not work. I am using ANTLRWorks and on debugging for PERSON_NAME=BATMAN
, I get a MismatchTokenException
for the RHS value.
谁能指导我哪里出错了?
Can anyone please guide me where I am going wrong?
推荐答案
BATMAN
被标记为 LHSOPERAND
标记.您必须意识到词法分析器不会考虑解析器在特定时间需要"什么.词法分析器只是尝试尽可能多地匹配,如果 2 个(或更多)规则匹配相同数量的字符(在您的情况下为 LHSOPERAND
和 RHSOPERAND
),则首先定义的规则将获胜",这就是 LHSOPERAND
规则.
BATMAN
is being tokenized as a LHSOPERAND
token. You must realize that the lexer does not take into account what the parser "needs" at a particular time. The lexer simply tries to match as much as possible, and in case 2 (or more) rules match the same amount of characters (LHSOPERAND
and RHSOPERAND
in your case), the rule defined first will "win", which is the LHSOPERAND
rule.
这样看:首先词法分析器接收字符流,然后将其转换为标记流.创建完所有标记后,解析器会接收这些标记,然后尝试理解这些标记.令牌不是在解析期间(在解析器规则中)创建,而是在解析之前创建.
EDIT
Look at it like this: first the lexer receives the character stream which it converts in a stream of tokens. After all tokens have been created, the parser receives these tokens which it then tries to make sense of. Tokens are not created during parsing (in parser rules), but before it.
关于你如何可以做到这一点的快速演示:
A quick demo of how you could do it:
grammar RuleGrammar;
prog
: condition EOF
;
condition
: logical
;
logical
: relational ((AND | OR) relational)*
;
relational
: STRINGVALUE ((GT | GTEQ | LT | LTEQ | EQ | NEQ) term)?
;
term
: STRINGVALUE
| NUMBERVALUE
| '(' condition ')'
;
GT : '>';
GTEQ : '>=';
LT : '<';
LTEQ : '<=';
EQ : '=';
NEQ : '<>';
NUMBERVALUE : '0'..'9'+;
AND : 'AND';
OR : 'OR';
STRINGVALUE : ('a'..'z' | 'A'..'Z' | '_')+;
SPACE : ' ' {skip();};
(注意EQ
和NEQ
并不是真正的关系运算符...)
(note that EQ
and NEQ
aren't really relational operators...)
解析输入如:
PERSON_NAME = BATMAN OR age <> 42
现在将导致以下解析:
这篇关于自定义 ANTLR 语法不适用于每个输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!