ANTLR4:为特定规则调用不同的子解析器 [英] ANTLR4: Invoke different sub-parser for specific rule

查看:158
本文介绍了ANTLR4:为特定规则调用不同的子解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个非常简化的示例,其中应匹配以下形式的输入

Consider this very simplified example where an input of the following form should be matched

mykey -> This is the value

我的实际情况要复杂得多,但这将显示我要实现的目标. mykeyID,而在->的右侧,我们有一组Words.如果我使用

My real case is much more complex but this will do for showing what I try to achieve. mykey is an ID while on the right side of -> we have a set of Words. If I use

grammar Root;

parse
    : ID '->' value
    ;

value
    : Word+
    ;

ID
    : ('a'..'z')+
    ;


Word
    : ('a'..'z' | 'A'..'Z' | '0'..'9')+
    ;

WS
    : ' ' -> skip
    ;

该示例将不会被解析,因为词法分析器将为第一个is给出一个ID令牌,该令牌与Word+不匹配.在我的实际示例中,value语言有很大的不同,我想用不同的语法来解析它.

the example won't be parsed because the lexer will give an ID token for the first is which is not matched by Word+. In my real example, the value-language is vastly different and I'd like to parse it with a different grammar.

我考虑了不同的解决方案:

I have considered different solutions:

  1. 切换词法分析器mode,但是AFAIK,将词法分析器切换到其他模式只能在词法分析器规则中发生.对于这种情况和我的实际情况,这都是有问题的,并且没有以value部分开头和结尾的唯一标记.我需要的是诸如用不同的规则标记value"之类的东西,这当然是愚蠢的,因为lexer和解析器是独立运行的,并且一旦解析器启动,一切都已经被标记了

  1. Switching the lexer mode but AFAIK, switching the lexer to a different mode can only happen in a lexer rule. This is problematic for this case and my real case as well as there are no unique tokens that start and end the value part. What I would need is something like "tokenize value with different rules" which is, of course, stupid, because lexer and parser act independently and as soon as the parser starts, everything is already tokenized

value使用不同的语法.当我看到这种权利时,导入语法的方法将行不通,因为它总是组合两个语法,导致出现相同的错误标记化情况.

Using a different grammar for value. When I see this right, the approach of importing a grammar won't work, since it always combines two grammars leading to the same situation of wrong tokenization.

创建第一个粗略的解析器,该解析器可以接受整个语言,但不能为value创建正确的树.然后,我可以使用一个visitor并使用不同的子解析器重新解析value节点,可能会为值插入一个新的正确的子树.感觉有点笨拙.

Creating a first crude parser, that accepts the whole language but doesn't create the correct tree for value. I could then use a visitor and reparse value nodes with a different sub-parser possibly inserting a new, correct subtree for value. This feels a bit clumsy.

如果您需要一个简单的实际应用程序,则可以考虑使用Java中的字符串.其中一些可能是正则表达式,需要使用完全不同的解析器进行解析.它类似于可以在IDEA中使用的注入语言.

If you need a simple real-world application, then you could consider strings in Java. Some of them might be a regex which needs to be parsed with a completely different parser. It is similar to injected languages you can use inside IDEA.

问题:ANTRL4中是否有惯用的方法来解析具有不同语法的特定规则?最好的情况是,我可以在语法级别进行指定,以使最终的AST是包含注入语言的子树的外部语言的组合.

Question: Is there an idiomatic way in ANTRL4 to parse a specific rule with a different grammar? Best case would be if I can specify this on the grammar level so that the resulting AST is a combination of the outer language that contains a sub-tree of the injected language.

推荐答案

您可以尝试将决策中的单词转移到解析器中:

You can try to transfert the decision what a word is into the parser:

grammar Root;

parse
  : ID '->' value
  ;

value
  : word+
  ;

word : Word | ID;

//the same lexer rules as above

这将解析

This  -> Word -> word
is    -> ID   -> word
the   -> ID   -> word
value -> ID   -> word

因此,在解析器节点级别,您只有单词.

So at the level of the parser nodes you will have only words.

这篇关于ANTLR4:为特定规则调用不同的子解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆