Antlr4:单引号规则在有转义字符加回车符时失败,换行 [英] Antlr4: single quote rule fails when there are escape chars plus carriage return, new line

查看:340
本文介绍了Antlr4:单引号规则在有转义字符加回车符时失败,换行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的语法:

grammar Testquote;
program : (Line ';')+ ;
Line: L_S_STRING ;
L_S_STRING  : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\''; // Single quoted string literal
L_WS        : L_BLANK+ -> skip ;   // Whitespace
fragment L_BLANK : (' ' | '\t' | '\r' | '\n') ;

这种语法-尤其是L_S_STRING-似乎可以很好地与像这样的原始输入配合使用:

This grammar--and the L_S_STRING in particular--seems working fine with vanilla inputs like:

'ab';
'cd';

但是,此输入失败:

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
'cd';

当我将第一行更改为任一行时 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z''';'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\' ';

Yet works when I changed the first line to either 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z'''; or 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\' ';

我可以看到解析器为何选择此失败路由的原因.但是有什么方法可以告诉我选择不同的方法吗?

I sorta can see why the parser may choose this failed route. But is there some way I can tell it to choose differently?

推荐答案

根据ANTLR4文档,词法分析器和解析器规则均为贪婪,因此会尽可能匹配输入.就您而言:

According to ANTLR4 docs, both lexer and parser rules are greedy, thus matching as much input as they can. In your case:

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
                               ^^^
'cd';

您的语法有些模棱两可-我突出显示的字符可以解释为\' '\ ''.看看它是如何工作的.

Your grammar is somewhat ambiguous - the characters I've highlighted can be interpreted as \' ' or as \ ''. See how it works.

没有'cd'时,词法分析器匹配一个字符串,因为它是语法的有效字符串,突出显示的字符匹配为\' '.但是由于lexer很贪婪,它将首先使用上述歧义来匹配不需要的输入,例如稍后在某处添加另一个未转义的'.

Without 'cd', lexer matches a string because it's a valid string for your grammar, highlighted characters are matched as \' '. But since lexer is greedy, it will use the aforementioned ambiguity to match unwanted input at first possibility, such as adding another unescaped ' somewhere later.

这种歧义是由反斜杠可能是普通字符还是转义字符引起的.消除此类歧义的常见解决方案是转义反斜杠本身的规则:\\,也需要不将其与普通字符匹配.

This ambiguity is caused by possibility of backslash being either normal character or escape character. The common solution for removing such ambiguity is a rule for escaping the backslash itself: \\, also you need to not match it as a normal character.

或者,您可能希望以不同的方式处理歧义.如果要优先于\'而不是'',则应输入:

Alternatively, you may want to deal with ambiguity in a different way. If you want to prioritize \' over '', you should write:

L_S_STRING  : '\'' ( ('\'\'') | ('\\'+ ~'\\') | ~('\'' | '\\') )* '\'' ;

它将适用于您的输入.

顺便说一句,您可以缩短L_WS的代码:

By the way, you can shorten your code for L_WS:

L_WS : [ \t\n\r]+ -> skip ;

这篇关于Antlr4:单引号规则在有转义字符加回车符时失败,换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆