处理 ANTLR4 中单引号和双引号字符串的范围 [英] Handling scope for single and double quote strings in ANTLR4

查看:60
本文介绍了处理 ANTLR4 中单引号和双引号字符串的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 ANTLR4,并且正在编写处理单引号和双引号字符串的语法.我正在尝试使用 Lexer 模式来确定字符串的范围,但这对我不起作用,下面列出了我的语法.这是正确的方法还是我如何正确地将这些解析为标记而不是带有上下文的解析器规则.有什么见解吗?

I am working with ANTLR4 and in the process of writing grammar to handle single and double quoted strings. I am trying to use Lexer modes to scope the strings but that is not working out for me, my grammar is listed below. Is this the right way or how can I properly parse these as tokens instead of parser rules with context. Any insight?

示例:

'single quote that contain "a double quote 'that has another single quote'"'

词法分析器语法

lexer grammar StringLexer;

fragment SQUOTE: '\'';

fragment QUOTE:  '"';

SQSTR_START: SQUOTE     -> pushMode(SQSTR_MODE);

DQSTR_START: QUOTE      -> pushMode(DQSTR_MODE);

CONTENTS: ~["\']+;

mode SQSTR_MODE;

SQSTR_END: (CONTENTS | DQSTR_START)+ SQUOTE -> popMode;

mode DQSTR_MODE;

DQSTR_END:(CONTENTS | SQSTR_START)+ QUOTE -> popMode;

解析器

parser grammar StringParser;
options { tokenVocab=StringLexer; }

start:
    dqstr | sqstr
;

dqstr:
 DQSTR_START DQSTR_END
 ;  

sqstr:
 SQSTR_START SQSTR_END
;

附录感谢 @Lucas Trzesniewski 的回答.

ADDENDUM Thanks @Lucas Trzesniewski for an answer.

这是我正在编写的语法的一部分,用于解析类 shell 语言,我可以有多行脚本,其中包含 SQSTRDQSTR.使用答案中提供的词法分析器规则,它会将多行脚本合并在一起.

This is part of grammar I am writing to parse shell-like language, I could have multiple lines of script where they would have SQSTR and DQSTR. With the lexer rules provided in the answer it would lump multiple lines of script together.

快乐案例(使用答案正确解析):

Happy case example (That get parsed correctly using the answer):

cmd 'single quote string'
cmd2 "double quote"
cmd3 'another single quote' 

这被识别为三个命令和三个字符串(单字符和双字符)

This get recognized as three commands and three strings (single and double)

未解析的示例:另一方面 - 请注意单引号字符串中的引号:

Unparsed example: On the other hand - note the quote in the single quote strings:

cmd 'single "quote string'
cmd2 "double quote"
cmd3 'another "single quote' 

在这种情况下,它会将所有这些错误地检测为 SQSTR 类型的单个字符串标记.

In this case it would incorrectly detect all of them as a single string token of type SQSTR.

有什么想法可以解决这个问题吗?

Any ideas how to address this problem?

推荐答案

如果要将示例字符串解析为单个标记,则不一定必须使用词法分析器模式,您可以使用相互递归的词法分析器规则代替:

If you want to parse your example string as a single token, you don't necessarily have to use lexer modes, you can use mutually-recursive lexer rules instead:

SQSTR : '\'' (~['"] | DQSTR)* '\'';
DQSTR : '"'  (~['"] | SQSTR)* '"';

然后,在解析器中使用类似:

Then, in the parser use something like:

str : SQSTR | DQSTR;

这篇关于处理 ANTLR4 中单引号和双引号字符串的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆