Antlr获取子令牌 [英] Antlr get Sub-Tokens
问题描述
原谅我的术语,请原谅我.
Forgive me if my terminology is off.
让我们说一下我的简化语法:
Lets say I have this bit of simplified grammar:
// parser
expr : COMPARATIVE;
// lexer
WS : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+;
COMPARATOR
: 'vs'
| 'versus'
;
ITEM
: 'boy'
| 'girl'
;
COMPARATIVE :ITEM WS* COMPARATOR WS* ITEM;
因此,这当然会匹配'boy vs girl'
或'girl vs boy'
等.
但是我的问题是,当我创建Lexer时,即
So this will of course match 'boy vs girl'
or 'girl vs boy'
, etc.
But my question is that is when I create a Lexer, i.e.
CharStream stream = new ANTLRInputStream("boy vs girl");
SearchLexer lex = new SearchLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lex);
tokens.fill();
for(Token token : tokens) {
System.out.print(token.getType() + " [" + token.getText() + "] ");
}
这将打印出如下内容: 9 [boy vs girl],即它与我的查询匹配得很好,但现在我希望能够执行类似的操作,获取当前令牌的子令牌.
This prints out something like this: 9 [boy vs girl], i.e. it matches my query fine, but now I want to be able to do something like, get the sub tokens of this current token.
我的直觉告诉我我需要使用树,但是对于我的示例,实际上我不知道如何在Antlr4中做到这一点.谢谢
My intuition tells me I need to use trees, but really don't know how to do this in Antlr4 for my example. Thanks
推荐答案
当前,COMPARATIVE
是一个词法分析器规则,这意味着它将尝试从与该规则匹配的所有文本中创建单个标记.您应该改用解析器规则comparative
:
Currently, COMPARATIVE
is a lexer rule which means it will try to make a single token from all the text that matches the rule. You should instead make a parser rule comparative
:
comparative : ITEM WS* COMPARATOR WS* ITEM;
由于COMPARATIVE
将不再被视为单个令牌,因此您将获得ITEM
,WS
和COMPARATOR
的单个令牌.
Since COMPARATIVE
will no longer be considered a single token, you'll instead get individual tokens for ITEM
, WS
, and COMPARATOR
.
两个注意事项:
-
如果空格不重要,则可以从解析器规则中将其隐藏,如下所示:
If whitespace is not significant, you can hide it from the parser rules like this:
WS : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ -> channel(HIDDEN);
然后,您可以将comparative
解析器规则简化为:
You can then simplify your comparative
parser rule to simply be:
comparative : ITEM COMPARATOR ITEM;
在ANTLR 4中,您可以使用新语法简化字符集:
In ANTLR 4, you can simplify character sets using a new syntax:
WS : [ \t\r\n\u000C]+ -> channel(HIDDEN);
这篇关于Antlr获取子令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!