antlr 语法避免尖括号 [英] antlr grammar avoiding angle brackets

查看:29
本文介绍了antlr 语法避免尖括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题中,我询问了从任意文本中提取标签的问题.提供的解决方案运行良好,但我想处理一个边缘情况.回顾一下,我正在解析任意用户输入的文本,并希望任何出现的 <> 符合有效的标记语法.如果尖括号不是有效标记的一部分,则应将其转义为 &lt;&gt;.我正在寻找的语法是 <foo#123> 其中 foo 是来自固定条目列表的文本,123 是一个数字[0-9]+.解析器:

In this question I asked about extracting tags from arbitrary text. The solution provided worked well, but there's one edge case I'd like to handle. To recap, I'm parsing arbitrary user-entered text and would like to have any occurrence of < or > to conform to valid tag syntax. Where an angle bracket isn't part of a valid tag, it should be escaped as &lt; or &gt;. The syntax I'm looking for is <foo#123> where foo is text from a fixed list of entries and 123 is a number [0-9]+. The parser:

parser grammar TagsParser;

options {
    tokenVocab = TagsLexer;
}

parse: (tag | text)* EOF;
tag: LANGLE fixedlist GRIDLET ID RANGLE;
text: NOANGLE;
fixedlist: FOO | BAR | BAZ;

词法分析器:

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;

mode tag:

RANGLE: '>' -> popMode;
GRIDLET: '#';
FOO: 'foo';
BAR: 'bar';
BAZ: 'baz';
ID: [0-9]+;
OTHERTEXT: . ;

这很有效并且成功地解析了诸如:

This works well and successfully parses text such as:

<foo#123>
Hi <bar#987>!
<baz#1><foo#2>anythinghere<baz#3>
if 1 &lt; 2

当我使用 BailErrorStrategy 时,它也成功地失败了以下:

It also successfully fails the following when I use the BailErrorStrategy:

<foo123>
<bar#a>
<foo#123H>
<unsupported#123>
if 1 < 2

最后一个成功失败,因为 < 进入了 tag 模式并且它与支持的标签格式不匹配.但是,我也想避免文本中 > 的实例,因此以下内容也应该失败:

The last one successfully fails because < enters the tag mode and it doesn't match a supported tag format. However, I would also like to avoid instances of > in the text as well, so the following should fail as well:

if 2 > 1

该文本应指定为 if 2 &gt;1 而不是原始尖括号.

That text should be specified as if 2 &gt; 1 instead of having the raw angle bracket.

如何修改语法,使不属于有效标记的 > 出现无法解析?

How can I modify the grammar so that occurrences of > which aren't part of a valid tag fail to parse?

推荐答案

就您现在的语法而言,它会在标记识别错误的标记外失败 >>,因为 > 不会出现在 tag 模式之外的词法分析器语法中.这是一个失败,因为它是.但是如果你坚持在 parse 期间失败,那么只需在词法分析器的默认模式中添加直角:

As your grammar stands now, it will fail > outside of a tag with token recognition error, because > doesn't appear in the lexer grammar outside of the tag mode. That's a failure all right as it is. But if you insist on failing during parse, then just add right angle to the lexer's default mode:

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
BADRANGLE: '>';

mode tag;

RANGLE: '>' -> popMode;
...

然后 > 在标签外将在解析过程中失败.

Then > outside of a tag will fail during parse.

这篇关于antlr 语法避免尖括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆