避免尖括号的反语法 [英] antlr grammar avoiding angle brackets
问题描述
在这个问题中,我问过从任意文本中提取标签的问题.提供的解决方案效果很好,但是我想处理一种边缘情况.回顾一下,我正在解析任意用户输入的文本,并希望出现<
或>
的任何内容以符合有效的标记语法.如果尖括号不是有效标签的一部分,则应将其转为<
或>
.我要查找的语法是<foo#123>
,其中foo
是固定条目列表中的文本,而123
是数字[0-9]+
.解析器:
In this question I asked about extracting tags from arbitrary text. The solution provided worked well, but there's one edge case I'd like to handle. To recap, I'm parsing arbitrary user-entered text and would like to have any occurrence of <
or >
to conform to valid tag syntax. Where an angle bracket isn't part of a valid tag, it should be escaped as <
or >
. The syntax I'm looking for is <foo#123>
where foo
is text from a fixed list of entries and 123
is a number [0-9]+
. The parser:
parser grammar TagsParser;
options {
tokenVocab = TagsLexer;
}
parse: (tag | text)* EOF;
tag: LANGLE fixedlist GRIDLET ID RANGLE;
text: NOANGLE;
fixedlist: FOO | BAR | BAZ;
词法分析器:
lexer grammar TagsLexer;
LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
mode tag:
RANGLE: '>' -> popMode;
GRIDLET: '#';
FOO: 'foo';
BAR: 'bar';
BAZ: 'baz';
ID: [0-9]+;
OTHERTEXT: . ;
这很好并且可以成功解析以下文本:
This works well and successfully parses text such as:
<foo#123>
Hi <bar#987>!
<baz#1><foo#2>anythinghere<baz#3>
if 1 < 2
当我使用BailErrorStrategy
时,它也成功失败了以下操作:
It also successfully fails the following when I use the BailErrorStrategy
:
<foo123>
<bar#a>
<foo#123H>
<unsupported#123>
if 1 < 2
最后一个成功失败,因为<
进入tag
模式并且与支持的标记格式不匹配.但是,我也想避免在文本中使用>
的实例,因此以下内容也应该失败:
The last one successfully fails because <
enters the tag
mode and it doesn't match a supported tag format. However, I would also like to avoid instances of >
in the text as well, so the following should fail as well:
if 2 > 1
该文本应指定为if 2 > 1
,而不要使用原始尖括号.
That text should be specified as if 2 > 1
instead of having the raw angle bracket.
如何修改语法,以使不是有效标签一部分的>
出现都无法解析?
How can I modify the grammar so that occurrences of >
which aren't part of a valid tag fail to parse?
推荐答案
按照您现在的语法,它会在标记之外的标记之外的>
失败,因为>
不会出现在词法分析器语法中在tag
模式之外.没错,这是一个失败.但是,如果您坚持在 parse 期间失败,则只需为词法分析器的默认模式添加直角即可:
As your grammar stands now, it will fail >
outside of a tag with token recognition error, because >
doesn't appear in the lexer grammar outside of the tag
mode. That's a failure all right as it is. But if you insist on failing during parse, then just add right angle to the lexer's default mode:
lexer grammar TagsLexer;
LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
BADRANGLE: '>';
mode tag;
RANGLE: '>' -> popMode;
...
然后,标签外的>
在解析期间将失败.
Then >
outside of a tag will fail during parse.
这篇关于避免尖括号的反语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!