ANTLR:从其他语法调用规则 [英] ANTLR: call a rule from a different grammar

查看:90
本文介绍了ANTLR:从其他语法调用规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以从其他语法调用规则?
目的是在同一文件中包含两种语言,第二种语言以(开始...)开头,其中...是第二种语言.该语法应调用另一种语法来解析第二种语言.

is it possible to invoke a rule from a different grammar?
the purpose is to have two languages in the same file, the second language starting by an (begin ...) where ... is in the second language. the grammar should invoke another grammar to parse that second language.

例如:


grammar A;

start_rule
    :    '(' 'begin' B.program ')' //or something like that
    ;


grammar B;

program
    :   something* EOF
    ;

something
    : ...
    ;

推荐答案

您的问题可以用(至少)两种方式解释:

Your question could be interpreted in (at least) two ways:

  1. 将规则从大文法中分离为单独的文法;
  2. 在主要"语言(岛屿语法)中解析一种单独的语言.

我认为这是第一个,在这种情况下,您可以导入语法.

I assume it's the first, in which case you can import grammars.

lexer grammar L;

Digit
  :  '0'..'9'
  ;

文件:Sub.g

parser grammar Sub;

number
  :  Digit+
  ;

文件:Root.g

grammar Root;

import Sub;

parse
  :  number EOF {System.out.println("Parsed: " + $number.text);}
  ;

文件:Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    L lexer = new L(new ANTLRStringStream("42"));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    RootParser parser = new RootParser(tokens);
    parser.parse();
  }
}

运行演示:

bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool L.g
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Root.g 
bart@hades:~/Programming/ANTLR/Demos/Composite$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp .:antlr-3.3.jar Main

将打印:

Parsed: 42

到控制台.

更多信息,请参见: http://www.antlr.org/wiki/display/ANTLR3/Composite + Grammars

正则表达式是语言中语言的一个很好的例子.您拥有带有元字符的常规"正则表达式语言,但其中还有另一种:描述字符集(或字符类)的语言.

A nice example of a language inside a language is regex. You have the "normal" regex language with its meta characters, but there's another one in it: the language that describes a character set (or character class).

您无需在正则表达式语法中考虑字符集(范围-,否定^等)的元字符,而只需将字符集视为由,然后在正则表达式语法中包含]的所有内容(可能包括\]!).然后,当您在解析器规则之一中偶然发现CharSet令牌时,您将调用CharSet-parser.

Instead of accounting for the meta characters of a character set (range -, negation ^, etc.) inside your regex-grammar, you could simply consider a character set as a single token consisting of a [ and then everything up to and including ] (with possibly \] in it!) inside your regex-grammar. When you then stumble upon a CharSet token in one of your parser rules, you invoke the CharSet-parser.

grammar Regex;

options { 
  output=AST;
}

tokens {
  REGEX;
  ATOM;
  CHARSET;
  INT;
  GROUP;
  CONTENTS;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    RegexLexer lexer = new RegexLexer(new ANTLRStringStream(source));
    RegexParser parser = new RegexParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  atom+ EOF -> ^(REGEX atom+)
  ;

atom
  :  group quantifier?     -> ^(ATOM group quantifier?)
  |  EscapeSeq quantifier? -> ^(ATOM EscapeSeq quantifier?)
  |  Other quantifier?     -> ^(ATOM Other quantifier?)
  |  CharSet quantifier?   -> ^(CHARSET {CharSetParser.ast($CharSet.text)} quantifier?)
  ;

group
  :  '(' atom+ ')' -> ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

CharSet
  :  '[' (('\\' .) | ~('\\' | ']'))+ ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Other
  :  ~('\\' | '(' | ')' | '[' | ']' | '+' | '*')
  ;

文件:CharSet.g

grammar CharSet;

options { 
  output=AST;
}

tokens {
  NORMAL_CHAR_SET;
  NEGATED_CHAR_SET;
  RANGE;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    CharSetLexer lexer = new CharSetLexer(new ANTLRStringStream(source));
    CharSetParser parser = new CharSetParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  OSqBr ( normal  -> ^(NORMAL_CHAR_SET normal)
           | negated -> ^(NEGATED_CHAR_SET negated)
           ) 
     CSqBr
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

negated
  :  Caret normal -> normal
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  from=Other Hyphen to=Other -> ^(RANGE $from $to)
  ;

OSqBr
      :  '['
  ;

CSqBr
  :  ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Caret
  :  '^'
  ;

Hyphen
  :  '-'
  ;

Other
  :  ~('-' | '\\' | '[' | ']')
  ;

文件:Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

如果您运行主类,则将看到正则表达式的 DOT输出 ((xyz)*[^\\da-f])foo这是以下树:

And if you run the main class, you will see the DOT output for the regex ((xyz)*[^\\da-f])foo which is the following tree:

神奇之处在于atom规则的Regex.g语法内,在该规则中,我通过调用CharSetParser类的静态ast方法在重写规则中插入了树节点:

The magic is inside the Regex.g grammar in the atom rule where I inserted a tree node in a rewrite rule by invoking the static ast method from the CharSetParser class:

CharSet ... -> ^(... {CharSetParser.ast($CharSet.text)} ...)

请注意,在此类重写规则中,必须不要是半冒号!因此,这将是错误的:{CharSetParser.ast($CharSet.text);}.

Note that inside such rewrite rules, there must not be a semi colon! So, this would be wrong: {CharSetParser.ast($CharSet.text);}.

以下是为两种语法创建树遍历器的方法:

And here's how to create tree walkers for both grammars:

tree grammar RegexWalker;

options {
  tokenVocab=Regex;
  ASTLabelType=CommonTree;
}

walk
  :  ^(REGEX atom+) {System.out.println("REGEX: " + $start.toStringTree());}
  ;

atom
  :  ^(ATOM group quantifier?)
  |  ^(ATOM EscapeSeq quantifier?)
  |  ^(ATOM Other quantifier?)
  |  ^(CHARSET t=. quantifier?) {CharSetWalker.walk($t);}
  ;

group
  :  ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

文件:CharSetWalker.g

tree grammar CharSetWalker;

options {
  tokenVocab=CharSet;
  ASTLabelType=CommonTree;
}

@members {
  public static void walk(CommonTree tree) {
    try {
      CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
      CharSetWalker walker = new CharSetWalker(nodes);
      walker.walk();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

walk
  :  ^(NORMAL_CHAR_SET normal)  {System.out.println("NORMAL_CHAR_SET: " + $start.toStringTree());}
  |  ^(NEGATED_CHAR_SET normal) {System.out.println("NEGATED_CHAR_SET: " + $start.toStringTree());}
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  ^(RANGE Other Other)
  ;

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
    RegexWalker walker = new RegexWalker(nodes);
    walker.walk();
  }
}

要运行演示,请执行以下操作:

To run the demo, do:

java -cp antlr-3.3.jar org.antlr.Tool CharSet.g 
java -cp antlr-3.3.jar org.antlr.Tool Regex.g
java -cp antlr-3.3.jar org.antlr.Tool CharSetWalker.g
java -cp antlr-3.3.jar org.antlr.Tool RegexWalker.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

将打印:

NEGATED_CHAR_SET: (NEGATED_CHAR_SET \d (RANGE a f))
REGEX: (REGEX (ATOM (GROUP (ATOM (GROUP (ATOM x) (ATOM y) (ATOM z)) *) (CHARSET (NEGATED_CHAR_SET \d (RANGE a f))))) (ATOM f) (ATOM o) (ATOM o))

这篇关于ANTLR:从其他语法调用规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆