ANTLR:从不同的语法调用规则 [英] ANTLR: call a rule from a different grammar

查看：28 发布时间：2021/11/11 3:40:18 antlr grammar modularity rule

本文介绍了ANTLR:从不同的语法调用规则的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以从不同的语法中调用规则?
目的是在同一个文件中有两种语言，第二种语言以 (begin ...) 开头，其中 ... 是第二种语言.语法应该调用另一个语法来解析第二种语言.

is it possible to invoke a rule from a different grammar?
the purpose is to have two languages in the same file, the second language starting by an (begin ...) where ... is in the second language. the grammar should invoke another grammar to parse that second language.

例如:


grammar A;

start_rule
    :    '(' 'begin' B.program ')' //or something like that
    ;


grammar B;

program
    :   something* EOF
    ;

something
    : ...
    ;

文件:Sub.g

parser grammar Sub;

number
  :  Digit+
  ;

文件:Root.g

grammar Root;

import Sub;

parse
  :  number EOF {System.out.println("Parsed: " + $number.text);}
  ;

文件:Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    L lexer = new L(new ANTLRStringStream("42"));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    RootParser parser = new RootParser(tokens);
    parser.parse();
  }
}

运行演示:

bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool L.g
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Root.g 
bart@hades:~/Programming/ANTLR/Demos/Composite$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp .:antlr-3.3.jar Main

将打印:

Parsed: 42

到控制台.

语言中语言的一个很好的例子是正则表达式.您拥有带有元字符的普通"正则表达式语言，但其中还有另一种:描述字符集(或字符类)的语言.

A nice example of a language inside a language is regex. You have the "normal" regex language with its meta characters, but there's another one in it: the language that describes a character set (or character class).

您可以简单地考虑一个字符，而不是在正则表达式语法中考虑字符集的元字符(范围 -、否定 ^ 等)设置为由 [ 组成的单个标记，然后在正则表达式中包含 ](可能包含 \] 在内的所有内容！)-语法.然后，当您在解析器规则之一中偶然发现 CharSet 标记时，您会调用 CharSet-parser.

Instead of accounting for the meta characters of a character set (range -, negation ^, etc.) inside your regex-grammar, you could simply consider a character set as a single token consisting of a [ and then everything up to and including ] (with possibly \] in it!) inside your regex-grammar. When you then stumble upon a CharSet token in one of your parser rules, you invoke the CharSet-parser.

grammar Regex;

options { 
  output=AST;
}

tokens {
  REGEX;
  ATOM;
  CHARSET;
  INT;
  GROUP;
  CONTENTS;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    RegexLexer lexer = new RegexLexer(new ANTLRStringStream(source));
    RegexParser parser = new RegexParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  atom+ EOF -> ^(REGEX atom+)
  ;

atom
  :  group quantifier?     -> ^(ATOM group quantifier?)
  |  EscapeSeq quantifier? -> ^(ATOM EscapeSeq quantifier?)
  |  Other quantifier?     -> ^(ATOM Other quantifier?)
  |  CharSet quantifier?   -> ^(CHARSET {CharSetParser.ast($CharSet.text)} quantifier?)
  ;

group
  :  '(' atom+ ')' -> ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

CharSet
  :  '[' (('\\' .) | ~('\\' | ']'))+ ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Other
  :  ~('\\' | '(' | ')' | '[' | ']' | '+' | '*')
  ;

文件:CharSet.g

grammar CharSet;

options { 
  output=AST;
}

tokens {
  NORMAL_CHAR_SET;
  NEGATED_CHAR_SET;
  RANGE;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    CharSetLexer lexer = new CharSetLexer(new ANTLRStringStream(source));
    CharSetParser parser = new CharSetParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  OSqBr ( normal  -> ^(NORMAL_CHAR_SET normal)
           | negated -> ^(NEGATED_CHAR_SET negated)
           ) 
     CSqBr
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

negated
  :  Caret normal -> normal
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  from=Other Hyphen to=Other -> ^(RANGE $from $to)
  ;

OSqBr
      :  '['
  ;

CSqBr
  :  ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Caret
  :  '^'
  ;

Hyphen
  :  '-'
  ;

Other
  :  ~('-' | '\\' | '[' | ']')
  ;

文件:Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

如果您运行主类，您将看到正则表达式的 DOT 输出 ((xyz)*[^\\da-f])foo 这是下面的树:

And if you run the main class, you will see the DOT output for the regex ((xyz)*[^\\da-f])foo which is the following tree:

魔术在 atom 规则中的 Regex.g 语法中，我通过调用静态 astCharSetParser 类中的 code> 方法:

The magic is inside the Regex.g grammar in the atom rule where I inserted a tree node in a rewrite rule by invoking the static ast method from the CharSetParser class:

CharSet ... -> ^(... {CharSetParser.ast($CharSet.text)} ...)

请注意，在此类重写规则中，必须不能有分号！所以，这是错误的:{CharSetParser.ast($CharSet.text);}.

Note that inside such rewrite rules, there must not be a semi colon! So, this would be wrong: {CharSetParser.ast($CharSet.text);}.

以下是为两种语法创建树漫步者的方法:

And here's how to create tree walkers for both grammars:

tree grammar RegexWalker;

options {
  tokenVocab=Regex;
  ASTLabelType=CommonTree;
}

walk
  :  ^(REGEX atom+) {System.out.println("REGEX: " + $start.toStringTree());}
  ;

atom
  :  ^(ATOM group quantifier?)
  |  ^(ATOM EscapeSeq quantifier?)
  |  ^(ATOM Other quantifier?)
  |  ^(CHARSET t=. quantifier?) {CharSetWalker.walk($t);}
  ;

group
  :  ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

文件:CharSetWalker.g

tree grammar CharSetWalker;

options {
  tokenVocab=CharSet;
  ASTLabelType=CommonTree;
}

@members {
  public static void walk(CommonTree tree) {
    try {
      CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
      CharSetWalker walker = new CharSetWalker(nodes);
      walker.walk();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

walk
  :  ^(NORMAL_CHAR_SET normal)  {System.out.println("NORMAL_CHAR_SET: " + $start.toStringTree());}
  |  ^(NEGATED_CHAR_SET normal) {System.out.println("NEGATED_CHAR_SET: " + $start.toStringTree());}
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  ^(RANGE Other Other)
  ;

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
    RegexWalker walker = new RegexWalker(nodes);
    walker.walk();
  }
}

要运行演示，请执行以下操作:

To run the demo, do:

java -cp antlr-3.3.jar org.antlr.Tool CharSet.g 
java -cp antlr-3.3.jar org.antlr.Tool Regex.g
java -cp antlr-3.3.jar org.antlr.Tool CharSetWalker.g
java -cp antlr-3.3.jar org.antlr.Tool RegexWalker.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

将打印:

NEGATED_CHAR_SET: (NEGATED_CHAR_SET \d (RANGE a f))
REGEX: (REGEX (ATOM (GROUP (ATOM (GROUP (ATOM x) (ATOM y) (ATOM z)) *) (CHARSET (NEGATED_CHAR_SET \d (RANGE a f))))) (ATOM f) (ATOM o) (ATOM o))

这篇关于ANTLR:从不同的语法调用规则的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ANTLR:从不同的语法调用规则 [英] ANTLR: call a rule from a different grammar

问题描述

推荐答案

文件:Sub.g

文件:Root.g

文件:Main.java

运行演示:

文件:CharSet.g

文件:Main.java

文件:CharSetWalker.g

Main.java

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ANTLR:从不同的语法调用规则 [英] ANTLR: call a rule from a different grammar

问题描述

推荐答案

文件:Sub.g

文件:Root.g

文件:Main.java

运行演示:

文件:CharSet.g

文件:Main.java

文件:CharSetWalker.g

Main.java

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭