Antlr4:单引号规则在有转义字符加回车符时失败,换行 [英] Antlr4: single quote rule fails when there are escape chars plus carriage return, new line
问题描述
我有这样的语法:
grammar Testquote;
program : (Line ';')+ ;
Line: L_S_STRING ;
L_S_STRING : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\''; // Single quoted string literal
L_WS : L_BLANK+ -> skip ; // Whitespace
fragment L_BLANK : (' ' | '\t' | '\r' | '\n') ;
这种语法-尤其是L_S_STRING
-似乎可以很好地与像这样的原始输入配合使用:
This grammar--and the L_S_STRING
in particular--seems working fine with vanilla inputs like:
'ab';
'cd';
但是,此输入失败:
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
'cd';
当我将第一行更改为任一行时
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z''';
或
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\' '
;
Yet works when I changed the first line to either
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z''';
or
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\' '
;
我可以看到解析器为何选择此失败路由的原因.但是有什么方法可以告诉我选择不同的方法吗?
I sorta can see why the parser may choose this failed route. But is there some way I can tell it to choose differently?
推荐答案
根据ANTLR4文档,词法分析器和解析器规则均为贪婪,因此会尽可能匹配输入强>.就您而言:
According to ANTLR4 docs, both lexer and parser rules are greedy, thus matching as much input as they can. In your case:
'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
^^^
'cd';
您的语法有些模棱两可-我突出显示的字符可以解释为\'
'
或\
''
.看看它是如何工作的.
Your grammar is somewhat ambiguous - the characters I've highlighted can be interpreted as \'
'
or as \
''
. See how it works.
没有'cd'
时,词法分析器匹配一个字符串,因为它是语法的有效字符串,突出显示的字符匹配为\'
'
.但是由于lexer很贪婪,它将首先使用上述歧义来匹配不需要的输入,例如稍后在某处添加另一个未转义的'
.
Without 'cd'
, lexer matches a string because it's a valid string for your grammar, highlighted characters are matched as \'
'
. But since lexer is greedy, it will use the aforementioned ambiguity to match unwanted input at first possibility, such as adding another unescaped '
somewhere later.
这种歧义是由反斜杠可能是普通字符还是转义字符引起的.消除此类歧义的常见解决方案是转义反斜杠本身的规则:\\
,也需要不将其与普通字符匹配.
This ambiguity is caused by possibility of backslash being either normal character or escape character. The common solution for removing such ambiguity is a rule for escaping the backslash itself: \\
, also you need to not match it as a normal character.
或者,您可能希望以不同的方式处理歧义.如果要优先于\'
而不是''
,则应输入:
Alternatively, you may want to deal with ambiguity in a different way. If you want to prioritize \'
over ''
, you should write:
L_S_STRING : '\'' ( ('\'\'') | ('\\'+ ~'\\') | ~('\'' | '\\') )* '\'' ;
它将适用于您的输入.
顺便说一句,您可以缩短L_WS的代码:
By the way, you can shorten your code for L_WS:
L_WS : [ \t\n\r]+ -> skip ;
这篇关于Antlr4:单引号规则在有转义字符加回车符时失败,换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!