ANTLR4:为特定规则调用不同的子解析器 [英] ANTLR4: Invoke different sub-parser for specific rule

查看:21
本文介绍了ANTLR4:为特定规则调用不同的子解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个非常简单的例子,其中应该匹配以下表单的输入

Consider this very simplified example where an input of the following form should be matched

mykey -> This is the value

我的真实案例要复杂得多,但这将有助于展示我尝试实现的目标.mykey 是一个 ID 而在 -> 的右侧,我们有一组 Words.如果我使用

My real case is much more complex but this will do for showing what I try to achieve. mykey is an ID while on the right side of -> we have a set of Words. If I use

grammar Root;

parse
    : ID '->' value
    ;

value
    : Word+
    ;

ID
    : ('a'..'z')+
    ;


Word
    : ('a'..'z' | 'A'..'Z' | '0'..'9')+
    ;

WS
    : ' ' -> skip
    ;

该示例不会被解析,因为词法分析器将为第一个 is 提供一个 ID 标记,该标记与 Word+ 不匹配.在我的真实示例中,value 语言大不相同,我想用不同的语法来解析它.

the example won't be parsed because the lexer will give an ID token for the first is which is not matched by Word+. In my real example, the value-language is vastly different and I'd like to parse it with a different grammar.

我考虑过不同的解决方案:

I have considered different solutions:

  1. 切换词法分析器 mode 但AFAIK,将词法分析器切换到不同模式只能发生在词法分析器规则中.这对于这个案例和我的真实案例来说是有问题的,并且没有开始和结束 value 部分的唯一标记.我需要的是用不同的规则对 value 进行标记",这当然很愚蠢,因为词法分析器和解析器独立运行,一旦解析器启动,一切都已经被标记化

  1. Switching the lexer mode but AFAIK, switching the lexer to a different mode can only happen in a lexer rule. This is problematic for this case and my real case as well as there are no unique tokens that start and end the value part. What I would need is something like "tokenize value with different rules" which is, of course, stupid, because lexer and parser act independently and as soon as the parser starts, everything is already tokenized

value 使用不同的语法.当我认为这正确时,导入语法的方法将不起作用,因为它总是组合两个语法导致错误标记化的相同情况.

Using a different grammar for value. When I see this right, the approach of importing a grammar won't work, since it always combines two grammars leading to the same situation of wrong tokenization.

创建第一个粗略的解析器,它接受整个语言,但不会为 value 创建正确的树.然后我可以使用访问者并使用不同的子解析器重新解析 value 节点,可能会插入一个新的、正确的值子树.这感觉有点笨拙.

Creating a first crude parser, that accepts the whole language but doesn't create the correct tree for value. I could then use a visitor and reparse value nodes with a different sub-parser possibly inserting a new, correct subtree for value. This feels a bit clumsy.

如果您需要一个简单的实际应用程序,那么您可以考虑使用 Java 中的字符串.其中一些可能是需要使用完全不同的解析器解析的正则表达式.它类似于您可以在 IDEA 中使用的注入语言.

If you need a simple real-world application, then you could consider strings in Java. Some of them might be a regex which needs to be parsed with a completely different parser. It is similar to injected languages you can use inside IDEA.

问题: ANTRL4 中是否有一种惯用的方法来解析具有不同语法的特定规则?最好的情况是,如果我可以在语法级别指定这一点,以便生成的 AST 是包含注入语言的子树的外部语言的组合.

Question: Is there an idiomatic way in ANTRL4 to parse a specific rule with a different grammar? Best case would be if I can specify this on the grammar level so that the resulting AST is a combination of the outer language that contains a sub-tree of the injected language.

推荐答案

您可以尝试将一个词是什么的决定转移到解析器中:

You can try to transfert the decision what a word is into the parser:

grammar Root;

parse
  : ID '->' value
  ;

value
  : word+
  ;

word : Word | ID;

//the same lexer rules as above

这将解析

This  -> Word -> word
is    -> ID   -> word
the   -> ID   -> word
value -> ID   -> word

因此在解析器节点级别,您将只有单词.

So at the level of the parser nodes you will have only words.

这篇关于ANTLR4:为特定规则调用不同的子解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆