其他标记中的 ANTLR 4 词法分析器标记 [英] ANTLR 4 lexer tokens inside other tokens
问题描述
我有以下 ANTLR 4 语法:
I have the following grammar for ANTLR 4:
grammar Pattern;
//parser rules
parse : string LBRACK CHAR DASH CHAR RBRACK ;
string : (CHAR | DASH)+ ;
//lexer rules
DASH : '-' ;
LBRACK : '[' ;
RBRACK : ']' ;
CHAR : [A-Za-z0-9] ;
我正在尝试解析以下字符串
And I'm trying to parse the following string
ab-cd[0-9]
代码解析出左侧的 ab-cd
,它将在我的应用程序中被视为文字字符串.然后它解析出 [0-9]
作为字符集,在这种情况下将转换为任何数字.我的语法对我有用,除非我不喜欢将 (CHAR | DASH)+
作为解析器规则,因为它只是被视为标记.我宁愿词法分析器创建一个 STRING
标记并给我以下标记:
The code parses out the ab-cd
on the left which will be treated as a literal string in my application. It then parses out [0-9]
as a character set which in this case will translate to any digit. My grammar works for me except I don't like to have (CHAR | DASH)+
as a parser rule when it's simply being treated as a token. I would rather the lexer create a STRING
token and give me the following tokens:
"ab-cd" "[" "0" "-" "9" "]"
代替这些
"ab" "-" "cd" "[" "0" "-" "9" "]"
我查看了其他示例,但无法弄清楚.通常其他示例在这样的字符串文字周围有引号,或者它们有空格来帮助分隔输入.我想避免两者.这可以通过词法分析器规则来完成,还是我需要像我一样继续在解析器规则中处理它?</p>
I have looked at other examples, but haven't been able to figure it out. Usually other examples have quotes around such string literals or they have whitespace to help delimit the input. I'd like to avoid both. Can this be accomplished with lexer rules or do I need to continue to handle it in the parser rules like I'm doing?
推荐答案
在 ANTLR 4 中,您可以为此使用词法分析器模式.
In ANTLR 4, you can use lexer modes for this.
STRING : [a-z-]+;
LBRACK : '[' -> pushMode(CharSet);
mode CharSet;
DASH : '-';
NUMBER : [0-9]+;
RBRACK : ']' -> popMode;
解析一个 [
字符后,词法分析器将在 CharSet
模式下运行,直到到达一个 ]
字符并且 popMode
命令被执行.
After parsing a [
character, the lexer will operate in mode CharSet
until a ]
character is reached and the popMode
command is executed.
这篇关于其他标记中的 ANTLR 4 词法分析器标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!