ANTLR4:为特定规则调用不同的子解析器 [英] ANTLR4: Invoke different sub-parser for specific rule
问题描述
考虑这个非常简化的示例,其中应匹配以下形式的输入
Consider this very simplified example where an input of the following form should be matched
mykey -> This is the value
我的实际情况要复杂得多,但这将显示我要实现的目标. mykey
是ID
,而在->
的右侧,我们有一组Words
.如果我使用
My real case is much more complex but this will do for showing what I try to achieve. mykey
is an ID
while on the right side of ->
we have a set of Words
. If I use
grammar Root;
parse
: ID '->' value
;
value
: Word+
;
ID
: ('a'..'z')+
;
Word
: ('a'..'z' | 'A'..'Z' | '0'..'9')+
;
WS
: ' ' -> skip
;
该示例将不会被解析,因为词法分析器将为第一个is
给出一个ID
令牌,该令牌与Word+
不匹配.在我的实际示例中,value
语言有很大的不同,我想用不同的语法来解析它.
the example won't be parsed because the lexer will give an ID
token for the first is
which is not matched by Word+
. In my real example, the value
-language is vastly different and I'd like to parse it with a different grammar.
我考虑了不同的解决方案:
I have considered different solutions:
-
切换词法分析器
mode
,但是AFAIK,将词法分析器切换到其他模式只能在词法分析器规则中发生.对于这种情况和我的实际情况,这都是有问题的,并且没有以value
部分开头和结尾的唯一标记.我需要的是诸如用不同的规则标记value
"之类的东西,这当然是愚蠢的,因为lexer和解析器是独立运行的,并且一旦解析器启动,一切都已经被标记了
Switching the lexer
mode
but AFAIK, switching the lexer to a different mode can only happen in a lexer rule. This is problematic for this case and my real case as well as there are no unique tokens that start and end thevalue
part. What I would need is something like "tokenizevalue
with different rules" which is, of course, stupid, because lexer and parser act independently and as soon as the parser starts, everything is already tokenized
为value
使用不同的语法.当我看到这种权利时,导入语法的方法将行不通,因为它总是组合两个语法,导致出现相同的错误标记化情况.
Using a different grammar for value
. When I see this right, the approach of importing a grammar won't work, since it always combines two grammars leading to the same situation of wrong tokenization.
创建第一个粗略的解析器,该解析器可以接受整个语言,但不能为value
创建正确的树.然后,我可以使用一个visitor并使用不同的子解析器重新解析value
节点,可能会为值插入一个新的正确的子树.感觉有点笨拙.
Creating a first crude parser, that accepts the whole language but doesn't create the correct tree for value
. I could then use a visitor and reparse value
nodes with a different sub-parser possibly inserting a new, correct subtree for value. This feels a bit clumsy.
如果您需要一个简单的实际应用程序,则可以考虑使用Java中的字符串.其中一些可能是正则表达式,需要使用完全不同的解析器进行解析.它类似于可以在IDEA中使用的注入语言.
If you need a simple real-world application, then you could consider strings in Java. Some of them might be a regex which needs to be parsed with a completely different parser. It is similar to injected languages you can use inside IDEA.
问题:ANTRL4中是否有惯用的方法来解析具有不同语法的特定规则?最好的情况是,我可以在语法级别进行指定,以使最终的AST是包含注入语言的子树的外部语言的组合.
Question: Is there an idiomatic way in ANTRL4 to parse a specific rule with a different grammar? Best case would be if I can specify this on the grammar level so that the resulting AST is a combination of the outer language that contains a sub-tree of the injected language.
推荐答案
您可以尝试将决策中的单词转移到解析器中:
You can try to transfert the decision what a word is into the parser:
grammar Root;
parse
: ID '->' value
;
value
: word+
;
word : Word | ID;
//the same lexer rules as above
这将解析
This -> Word -> word
is -> ID -> word
the -> ID -> word
value -> ID -> word
因此,在解析器节点级别,您只有单词.
So at the level of the parser nodes you will have only words.
这篇关于ANTLR4:为特定规则调用不同的子解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!