ANTLR4中单引号和双引号字符串的处理范围 [英] Handling scope for single and double quote strings in ANTLR4

查看:664
本文介绍了ANTLR4中单引号和双引号字符串的处理范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ANTLR4,并且正在编写语法以处理单引号和双引号字符串.我正在尝试使用Lexer模式来限制字符串的范围,但这对我来说不起作用,下面列出了我的语法.这是正确的方法还是我该如何正确地将它们解析为标记,而不是具有上下文的解析器规则.有见识吗?

I am working with ANTLR4 and in the process of writing grammar to handle single and double quoted strings. I am trying to use Lexer modes to scope the strings but that is not working out for me, my grammar is listed below. Is this the right way or how can I properly parse these as tokens instead of parser rules with context. Any insight?

一个例子:

'single quote that contain "a double quote 'that has another single quote'"'

Lexer语法

lexer grammar StringLexer;

fragment SQUOTE: '\'';

fragment QUOTE:  '"';

SQSTR_START: SQUOTE     -> pushMode(SQSTR_MODE);

DQSTR_START: QUOTE      -> pushMode(DQSTR_MODE);

CONTENTS: ~["\']+;

mode SQSTR_MODE;

SQSTR_END: (CONTENTS | DQSTR_START)+ SQUOTE -> popMode;

mode DQSTR_MODE;

DQSTR_END:(CONTENTS | SQSTR_START)+ QUOTE -> popMode;

解析器

parser grammar StringParser;
options { tokenVocab=StringLexer; }

start:
    dqstr | sqstr
;

dqstr:
 DQSTR_START DQSTR_END
 ;  

sqstr:
 SQSTR_START SQSTR_END
;

附录 感谢@Lucas Trzesniewski的答案.

ADDENDUM Thanks @Lucas Trzesniewski for an answer.

这是我编写的用于解析类壳语言的语法的一部分,我可以使用多行脚本来编写SQSTRDQSTR.答案中提供了词法分析器规则,它将多行脚本合并在一起.

This is part of grammar I am writing to parse shell-like language, I could have multiple lines of script where they would have SQSTR and DQSTR. With the lexer rules provided in the answer it would lump multiple lines of script together.

快乐的例子(使用答案正确解析):

Happy case example (That get parsed correctly using the answer):

cmd 'single quote string'
cmd2 "double quote"
cmd3 'another single quote' 

这被识别为三个命令和三个字符串(单和双)

This get recognized as three commands and three strings (single and double)

未分析的示例:另一方面,请注意单引号字符串中的引号:

Unparsed example: On the other hand - note the quote in the single quote strings:

cmd 'single "quote string'
cmd2 "double quote"
cmd3 'another "single quote' 

在这种情况下,它会错误地将所有它们检测为SQSTR类型的单个字符串令牌.

In this case it would incorrectly detect all of them as a single string token of type SQSTR.

有什么想法可以解决这个问题吗?

Any ideas how to address this problem?

推荐答案

如果要将示例字符串解析为单个标记,则不必使用词法分析器模式,而可以使用相互递归的词法分析器规则:

If you want to parse your example string as a single token, you don't necessarily have to use lexer modes, you can use mutually-recursive lexer rules instead:

SQSTR : '\'' (~['"] | DQSTR)* '\'';
DQSTR : '"'  (~['"] | SQSTR)* '"';

然后,在解析器中使用类似以下内容的内容:

Then, in the parser use something like:

str : SQSTR | DQSTR;

这篇关于ANTLR4中单引号和双引号字符串的处理范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆