ANTLR4-如何解析相同字符串值之间的内容 [英] ANTLR4 - How to parse content between same string values

查看:110
本文介绍了ANTLR4-如何解析相同字符串值之间的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个antlr4解析器规则,该规则可以匹配一些相同的任意字符串值之间的内容.到目前为止,我找不到执行此操作的方法.

I'm trying to write an antlr4 parser rule that can match the content between some arbitrary string values that are same. So far I couldn't find a method to do it.

例如,在下面的输入中,我需要一条规则来提取 Hello Bye .我对提取 xyz 并不感兴趣.

For example, in the below input, I need a rule to extract Hello and Bye. I'm not interested in extracting xyz though.

TEXT 您好 TEXT

TEXT1 再见 TEXT1

TEXT5 xyz TEXT8

TEXT5 xyz TEXT8

因为它与XML元素语法非常相似,所以我尝试了 ANTLR4 XML语法,但它会解析输入,例如< ABC>...</XYZ> 没有错误,这不是我想要的.

As it is very much similar to an XML element grammar, I tried an example for XML Parser given in ANTLR4 XML Grammar, but it parses an input like <ABC> ... </XYZ> without error which is not what I wanted.

我还尝试了使用语义谓词,但收效甚微.

I also tried using semantic predicates without much success.

任何人都可以帮助您提示如何匹配嵌入在相同字符串之间的内容吗?

Could anyone please help with a hint on how to match content that is embedded between same strings?

谢谢!

Satheesh

推荐答案

由于解析器必须进行许多检查,因此不确定如何在性能上实现,但是您可以尝试执行以下操作:

Not sure how this works out performance wise, because of many many checks the parser has to do, but you could try something like:

token:
    start = IDENTIFIER WORD* end = IDENTIFIER { start == end }?
;

大括号之间的部分是验证语义谓词.我相信词法分析器标记是不言自明的.

The part between the curly braces is a validating semantic predicate. The lexer tokens are self-explanatory, I believe.

我考虑得越多,您最好将输入标记化并编写一个所有者解析器来处理输入并据此采取行动.当然取决于语法的复杂性.

The more I think about it, it might be better you just tokenize the input and write an owner parser that processes the input and acts accordingly. Depends of course on the complexity of the syntax.

这篇关于ANTLR4-如何解析相同字符串值之间的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆