其他令牌中的ANTLR 4词法分析器令牌 [英] ANTLR 4 lexer tokens inside other tokens

查看:114
本文介绍了其他令牌中的ANTLR 4词法分析器令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于ANTLR 4,我有以下语法:

I have the following grammar for ANTLR 4:

grammar Pattern;

//parser rules
parse   : string LBRACK CHAR DASH CHAR RBRACK ;
string  : (CHAR | DASH)+ ;

//lexer rules
DASH    : '-' ;
LBRACK  : '[' ;
RBRACK  : ']' ;
CHAR    : [A-Za-z0-9] ;

我正在尝试解析以下字符串

And I'm trying to parse the following string

ab-cd[0-9]

代码解析出左侧的ab-cd,在我的应用程序中将其视为文字字符串.然后它将[0-9]解析为字符集,在这种情况下将转换为任何数字.我的语法对我有用,除了我不愿意将(CHAR | DASH)+作为解析器规则(当它只是被当作标记使用时).我希望词法分析器创建一个STRING令牌并给我以下令牌:

The code parses out the ab-cd on the left which will be treated as a literal string in my application. It then parses out [0-9] as a character set which in this case will translate to any digit. My grammar works for me except I don't like to have (CHAR | DASH)+ as a parser rule when it's simply being treated as a token. I would rather the lexer create a STRING token and give me the following tokens:

"ab-cd" "[" "0" "-" "9" "]"

代替这些

"ab" "-" "cd" "[" "0" "-" "9" "]"

我看过其他示例,但无法弄清楚.通常,其他示例在此类字符串文字周围加上引号,或者它们具有空格以帮助定界输入.我想避免两者.可以使用lexer规则完成此操作,还是需要像我正在做的那样继续在解析器规则中处理它?<​​/p>

I have looked at other examples, but haven't been able to figure it out. Usually other examples have quotes around such string literals or they have whitespace to help delimit the input. I'd like to avoid both. Can this be accomplished with lexer rules or do I need to continue to handle it in the parser rules like I'm doing?

推荐答案

在ANTLR 4中,您可以为此使用词法分析器模式.

In ANTLR 4, you can use lexer modes for this.

STRING : [a-z-]+;
LBRACK : '[' -> pushMode(CharSet);

mode CharSet;

DASH : '-';
NUMBER : [0-9]+;
RBRACK : ']' -> popMode;

解析[字符后,词法分析器将在CharSet模式下运行,直到到达]字符并执行popMode命令为止.

After parsing a [ character, the lexer will operate in mode CharSet until a ] character is reached and the popMode command is executed.

这篇关于其他令牌中的ANTLR 4词法分析器令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆