ANTLR3语法与谓词不匹配规则 [英] ANTLR3 grammar does not match rule with predicate
问题描述
我有一个组合的语法,需要提供两个标识符词法分析器规则. 两个标识符可以同时使用.在语法上,Identifier1在Identifer2之前.
I have a combined grammar where I need to provide for two identifier lexer rules. Both identifiers can be used at the same time. Identifier1 comes before Identifer2 in grammar.
第一个标识符是静态的,而第二个标识符规则根据某个标志而改变(使用谓词).
First identifier is static, whereas second identifier rule changes on the basis of some flag.(Using predicate).
我希望第二个标识符在解析器规则中匹配.但是由于这两个标识符可能与某些常用输入匹配,因此它不会属于identifer2.
I want the second identifier to match in parser rules. But as both identifiers may match some common inputs, It does not fall on identifer2.
我创建了小语法以使其易于理解.语法为:
I have created small grammar to make it understandable. Grammar is as:
@lexer::members
{
private boolean flag;
public void setFlag(boolean flag)
{
this.flag = flag;
}
}
identifier1 :
ID1
;
identifier2 :
ID2
;
ID1 : (CHARS) *;
ID2 : (CHARS | ({flag}? '_'))* ;
fragment CHARS
:
('a' .. 'z')
;
如果我尝试将identifer2规则匹配为:
If I try to match identifer2 rule as :
ANTLRStringStream in = new ANTLRStringStream("abcabde");
IdTestLexer lexer = new IdTestLexer(in);
lexer.setFlag(true);
CommonTokenStream tokens = new CommonTokenStream(lexer);
IdTestParser parser = new IdTestParser(tokens);
parser.identifier2();
它显示错误: 第1:0行在"abcabde"处缺少ID2
It shows error: line 1:0 missing ID2 at 'abcabde'
推荐答案
ID1 : (CHARS) *;
ID2 : (CHARS | ({flag}? '_'))* ;
对于ANTLR,这两个规则意味着:
For ANTLR these two rules mean:
- 如果输入只是字符,则为
ID1
- 如果输入中混合了字符和
_
和flag == true
,则为ID2
- If the input is just characters, it's
ID1
- If the input mixes characters and
_
andflag == true
, it'sID2
请注意,如果flag == false
,则ID2
将永远不会被匹配.
Note that if flag == false
, ID2
will never be matched.
Lexer遵循的两个基本规则是:
The two basic rules the Lexer follows are:
- 它与覆盖最长输入子序列的令牌匹配
- 如果多个标记可以匹配相同的输入,请使用语法中最先出现的标记
我相信您的核心问题是误解词法分析器和解析器之间的区别以及它们的用法.您应该问自己的问题是:何时应将'abcabde'匹配为ID1
,何时应匹配为ID2
?
I believe your core issue is misunderstanding the difference between lexer and parser and their usage. The question you should ask yourself is: When should 'abcabde' be matched as ID1
and when as ID2
?
- 总是
ID1
-那么您的语法是正确的. - 总是
ID2
-然后您应该切换两个规则-但请注意,在这种情况下ID1
将永远不匹配. - 这取决于
flag
-然后您需要根据逻辑修改谓词,仅切换下划线是不够的. - 这取决于标识符在输入中的使用位置-但这不是lexer可以决定的事情,您需要在解析器中而不是在lexer中区分这两种标识符.正式地,当您需要常规语言. wikipedia.org/wiki/Context-free_language"rel =" nofollow noreferrer>无上下文语言来决定类似的标识符.
- Always
ID1
- then your grammar is correct as it is now. - Always
ID2
- then you should switch the two rules - but note that in such caseID1
will never be matched. - It depends on
flag
- then you need to modify the predicate according to your logic, just toggling the underscore isn't enough. - It depends on where in the input the identifier is used - then this is not something that lexer can decide, and you need to tell the two kinds of identifiers apart in parser rather than lexer. Formally, lexer uses regular language while you need context-free language to decide about the identifiers like that.
这篇关于ANTLR3语法与谓词不匹配规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!