ANTLR4 - 如何解析相同字符串值之间的内容 [英] ANTLR4 - How to parse content between same string values

查看:31
本文介绍了ANTLR4 - 如何解析相同字符串值之间的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个 antlr4 解析器规则,该规则可以匹配某些相同的任意字符串值之间的内容.到目前为止,我找不到一种方法来做到这一点.

I'm trying to write an antlr4 parser rule that can match the content between some arbitrary string values that are same. So far I couldn't find a method to do it.

例如,在下面的输入中,我需要一个规则来提取 HelloBye.不过,我对提取 xyz 不感兴趣.

For example, in the below input, I need a rule to extract Hello and Bye. I'm not interested in extracting xyz though.

TEXT 你好 TEXT

TEXT1再见TEXT1

TEXT5 xyz TEXT8

TEXT5 xyz TEXT8

由于它非常类似于 XML 元素语法,我尝试了 ANTLR4 XML 语法,但它会解析像 <ABC> 这样的输入.... </XYZ> 没有错误,这不是我想要的.

As it is very much similar to an XML element grammar, I tried an example for XML Parser given in ANTLR4 XML Grammar, but it parses an input like <ABC> ... </XYZ> without error which is not what I wanted.

我也尝试过使用语义谓词,但没有取得多大成功.

I also tried using semantic predicates without much success.

有人可以帮忙提示如何匹配嵌入在相同字符串之间的内容吗?

Could anyone please help with a hint on how to match content that is embedded between same strings?

谢谢!

萨蒂什

推荐答案

不确定这如何在性能方面发挥作用,因为解析器必须进行许多检查,但您可以尝试以下操作:

Not sure how this works out performance wise, because of many many checks the parser has to do, but you could try something like:

token:
    start = IDENTIFIER WORD* end = IDENTIFIER { start == end }?
;

花括号之间的部分是一个验证语义谓词.我相信词法分析器标记是不言自明的.

The part between the curly braces is a validating semantic predicate. The lexer tokens are self-explanatory, I believe.

我考虑得越多,您最好只标记输入并编写一个所有者解析器来处理输入并相应地采取行动.当然取决于语法的复杂程度.

The more I think about it, it might be better you just tokenize the input and write an owner parser that processes the input and acts accordingly. Depends of course on the complexity of the syntax.

这篇关于ANTLR4 - 如何解析相同字符串值之间的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆