ANTLR:如何使用树语法替换子树中的特定节点? [英] ANTLR: How to replace specific nodes in a subtree using a Tree grammar?

查看：79 发布时间：2020/9/2 23:23:16 antlr

本文介绍了ANTLR:如何使用树语法替换子树中的特定节点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个创建AST的ANTLR语法，然后编写了两个树语法，这些树语法创建树解析器以对AST进行两次传递以进行语义分析. (之后，我又进行了一次传递，并使用StringTemplate生成了输出代码)

I have an ANTLR grammar which creates an AST, and then I have written two tree grammars that create tree parsers for doing two passes on the AST for semantic analysis purposes. (After that I do another pass and generate output code with StringTemplate)

到目前为止，一切工作都很好，但是我试图扩展该语言以支持函数中的参数多态性(而到目前为止，它仅支持简单"函数).

Everything works fine so far, but I am trying to extend the language to support parametric polymorphism in functions (while so far it only supported "simple" functions).

例如，我想要这样的东西:

So for example I want to have something like this:

T getMax<T>  (T a, T b) {
     if (a > b) return a;
   return b;
}

并根据调用函数的实际类型生成简单的非参数多态代码.例如，如果有人呼叫getMax<int> (5)，那么我只会为int getMax(int a, int b)

and generate simple, non parametrically polymorphic code, according to the actual types with which the function is called. For example if someone calls getMax<int> (5) then I will only generate the code for int getMax(int a, int b)

到目前为止，在第一遍中，我检查了对多态函数的所有调用，并保存了使用该函数调用的特定类型.

So far, in the 1st pass, I check all the calls to the polymorphic function, and save the specific types with which the function was called with.

因此，在这一点上，我知道所有参数类型以及需要替换的所有实际类型.

So, at this point I know all the parametric types, and all the actual types that they need to be replaced with.

在第二遍中，我想实际修改我的语法树，并使用1个或多个具有特定类型的函数声明替换此参数多态函数声明.

In the 2nd pass, I want to actually modify my syntax tree and replace this parametrically polymorphic function declaration, with 1 or more function declarations that have specific types.

所以，问题是

在AST中复制和创建同级"节点(及其所有子级)并将其插入到原始函数声明旁边的最佳方法是什么? 我是否只说

What is the best way to copy and create "sibling" nodes (with also all their children) in the AST and insert them next to the original function declaration? Do I just say something like

      {
       myTreeNode.parent.addChild(myTreeNode.dupNode());  
      }

在上述示例中，用新创建的子树中的int替换所有出现的T类型的最佳方法是什么?我认为简单的重写规则是不够的，因为我还需要替换函数主体中的所有类型. 我是否需要编写另一棵仅在此函数声明子树上起作用的树语法，并进行所有替换?手动进行比较容易吗?

What is the best way to replace all occurrences of type T with, say, int in the newlycreated subtree in the above example? I think a simple rewrite rule is not enough, as I also need to replace all the types in the body of the function. Do I need to write another tree grammar that will work just on this function declaration subtree and do all the replacing? Is it easier to do it manually?

抱歉，这太令人困惑了.

Sorry if this is too confusing.

修改:

比方说，您的输入是这样的:

Let's say your input is this:

T add<T> (T a, T b) { return a+b }
add<int>(1, 2)
add<string>('f', 'oo')

第二遍过后，您的AST会如何?

how would your AST look like after the 2nd pass?

我当时正在考虑删除原始的函数声明，并在其位置引入2个专门的声明.

I was thinking of deleting the original function declaration, and introducing 2 specialised declarations at its place.

因此，最终的AST将是具有4个子级，2个函数代数和2个函数调用的根节点(我们可以称其为程序").

So the resulting AST would be a root node (we can call it "program") with 4 children, 2 function delcarations and 2 function calls.

印刷Lisp样式:

(METHOD_DECL int add (ARG_DECL int a) (ARG_DECL int b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b)))))) 
(METHOD_DECL string add (ARG_DECL string a) (ARG_DECL string b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b)))))) 
(EXPR (CALL add (ELIST (EXPR 3) (EXPR 4)))) 
(EXPR (CALL add (ELIST (EXPR "f") (EXPR "oo"))))

被删除并替换的子树是这样的:

The subtree that got deleted and replaced was this:

(METHOD_DECL T add (TYPEPARAMS T) (ARG_DECL T a) (ARG_DECL T b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b))))))

此外，我要删除原始的参数多态函数声明子树的原因是，在执行类型检查时，我不知道如何忽略它.树解析器自动"匹配所有二进制操作，返回语句，参数等，并进行类型检查.因此，如果我把它留在那里，它会报告错误，因为例如T不是正确的类型，因此您不能将其与+运算符一起使用.

Also, the reason I want to delete the original parametrically polymorphic function declaration subtree, is that I don't know how to ignore it when I do the type checking. The tree parser "automatically" matches all the binary operations, return statements, arguments etc and does the type checking. So, if I leave it there it reports errors because for example T is not a proper type, so you can't use it with the + operator.

推荐答案

我个人是让解析器将方法与主代码块分开的.输入以下信息:

Personally, I'd let the parser separate the methods from main code block. Given the input:

T add<T> (T a, T b) { 
  T c = a + b 
  return c 
}
int sub(int x, int y) { return x-y }
add<int>(1, 2)
add<string>('f', 'oo')

然后解析器将生成表示以下内容的树:

then the parser would produce the tree that represents:

add<int>(1, 2)
add<string>('f', 'oo')

和代表方法的单独树:

int    add (int a, int b)       { int c = a + b      return c   }
string add (string a, string b) { string c = a + b   return c   }
int    sub (int x, int y)       {                    return x-y }

在解析阶段，您只需跟踪所有参数参数(此后为 pp )两个实例变量中的-calls和-methods:

During the parsing phase, you simply keep track of all parametric parameter (pp henceforth) -calls and -methods in two instance variables:

@parser::members {

  private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
  private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();

  // a separate AST for all methods
  public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));

  // ...

}

解析示例代码后，ppMethodCalls将保持:

After parsing the example code, ppMethodCalls would hold:

{"add" => {Token="int", Token="string"}}

和ppMethods将保持:

{"add" => Tree=^(METHOD T add ... )}

解析器完全解析了输入源后，将调用以下方法:

When the parser has fully parsed the input source, the following method is called:

  public void replacePP() {

    // iterate over all pp method calls
    for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {

      // get the name of the method being called
      String name = entry.getKey();

      // iterate over all tokens ('int' and 'string', in my example)
      for(Token tok : entry.getValue()) {

        // get the original pp-method instance
        CommonTree ppm = ppMethods.get(name);

        // create a deep copy from the original pp-method (will post 'copyTree(...)' later)
        CommonTree cp = Main.copyTree(ppm);

        // remove the first token from the tree (this is 'T' in my example)
        CommonTree pp = (CommonTree)cp.deleteChild(0);

        // replace all 'T' tokens with the actual parameter (either 'int' or 'string' in my example)
        replace(pp, tok, cp);

        // add the rewritten tree to the separate 'methods' tree
        methods.addChild(cp);
      }
    }
  }

  private void replace(CommonTree param, Token token, CommonTree tree) {
    if(tree == null) return;
    if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
      tree.getToken().setType(token.getType());
      tree.getToken().setText(token.getText());
    }
    if(tree.getChildCount() > 0) {
      for(Object child : tree.getChildren()) {
        replace(param, token, (CommonTree)child);
      }
    }
  }

这将为add<int>(...)和add<string>(...)创建两个新树，并将它们添加到实例变量:methods: CommonTree.

which will create two new trees for both add<int>(...) and add<string>(...) and add them to the instance variable: methods: CommonTree.

一些演示:

grammar PP;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
  ROOT;
  METHODS;
  BLOCK;
  ASSIGN;
  ARG;
  ARGS;
  ELIST;
  METHOD;
  CALL;
}

@parser::header {
  import java.util.Map;
  import java.util.HashMap;
  import java.util.Set;
  import java.util.HashSet;
}

@parser::members {

  private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
  private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();
  public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));

  private void ppCall(String name, Token pp) {
    Set<Token> existing = ppMethodCalls.remove(name);
    if(existing == null) existing = new HashSet<Token>();
    existing.add(pp);
    ppMethodCalls.put(name, existing);
  }

  public void replacePP() {
    for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {
      String name = entry.getKey();
      for(Token tok : entry.getValue()) {
        CommonTree ppm = ppMethods.get(name);
        CommonTree cp = Main.copyTree(ppm);
        CommonTree pp = (CommonTree)cp.deleteChild(0);
        replace(pp, tok, cp);
        methods.addChild(cp);
      }
    }
  }

  private void replace(CommonTree param, Token token, CommonTree tree) {
    if(tree == null) return;
    if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
      tree.getToken().setType(token.getType());
      tree.getToken().setText(token.getText());
    }
    if(tree.getChildCount() > 0) {
      for(Object child : tree.getChildren()) {
        replace(param, token, (CommonTree)child);
      }
    }
  }
}

parse
  :  block EOF -> block
  ;

block
  :  statement* -> ^(BLOCK statement*)
  ;

statement
  :  ppMethod     {ppMethods.put($ppMethod.name, $ppMethod.tree);} -> /* omit from main AST */
  |  normalMethod {methods.addChild($normalMethod.tree);}          -> /* omit from main AST */
  |  methodCall
  |  assignment
  |  Return expr -> ^(Return expr)
  ;

assignment
  :  type Id '=' expr -> ^(ASSIGN type Id expr)
  |  Id Id '=' expr   -> ^(ASSIGN Id Id expr)
  ;

normalMethod
  :  type Id '(' argList ')' '{' block '}' -> ^(METHOD type Id argList block)
  ;

ppMethod returns [String name]
  :  tp=Id id=Id {$name=$id.text;} '<' pp=Id '>' '(' ppArgList ')' '{' block '}' -> ^(METHOD $pp $tp $id ppArgList block)
  ;

methodCall
  :  Id ('<' type '>' {ppCall($Id.text, $type.tree.getToken());})? '(' exprList ')' -> ^(CALL Id exprList)
  ;

argList
  :  (arg (',' arg)*)? -> ^(ARGS arg*)
  ;

arg
  :  type Id -> ^(ARG type Id)
  ;

ppArgList
  :  (ppArg (',' ppArg)*)? -> ^(ARGS ppArg*)
  ;

ppArg
  :  type Id -> ^(ARG type Id)
  |  Id Id   -> ^(ARG Id Id)
  ;

exprList
  :  (expr (',' expr)*)? -> ^(ELIST expr*)
  ;

expr
  :  atom (('+' | '-')^ atom)*
  ;

atom
  :  Int
  |  Str
  |  Id
  |  methodCall
  |  '(' expr ')' -> expr
  ;

type
  :  K_Int
  |  K_Str
  ;

Return : 'return';
K_Int  : 'int';
K_Str  : 'string';
Id     : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit)*;
Int    : Digit+;
Str    : '\'' ~'\''* '\'';

Comment : '#' ~('\r' | '\n')* {$channel=HIDDEN;};
Space   : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

fragment Digit : '0'..'9';

文件:Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

  public static void main(String[] args) throws Exception {
    String src =
        "T add<T> (T a, T b) {                 \n" +
        "    T c = a + b                       \n" +
        "    return c                          \n" +
        "}                                     \n" +
        "int sub(int x, int y) { return x-y }  \n" +
        "add<int>(1, 2)                        \n" +
        "add<string>('f', 'oo')                \n";
    PPLexer lexer = new PPLexer(new ANTLRStringStream(src));
    PPParser parser = new PPParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    parser.replacePP();
    System.out.println(new DOTTreeGenerator().toDOT(tree));
    System.out.println(new DOTTreeGenerator().toDOT(parser.methods));
  }

  // put this in a Util-class or else in the parser   
  public static CommonTree copyTree(CommonTree original) {
    CommonTree copy = new CommonTree(original.getToken());
    copyTreeRecursive(copy, original);
    return copy;
  }

  private static void copyTreeRecursive(CommonTree copy, CommonTree original) {
    if(original.getChildren() != null) {
      for(Object o : original.getChildren()) {

        CommonTree originalChild = (CommonTree)o;

        // get the token from the original child node
        CommonToken oTok = (CommonToken)originalChild.getToken();

        // create a new token with the same type & text as 'oTok' 
        CommonToken cTok = new CommonToken(oTok.getType(), oTok.getText());

        // copy all attributes from 'oTok' to 'cTok'  
        cTok.setLine(oTok.getLine());
        cTok.setCharPositionInLine(oTok.getCharPositionInLine());
        cTok.setChannel(oTok.getChannel());
        cTok.setStartIndex(oTok.getStartIndex());
        cTok.setStopIndex(oTok.getStopIndex());
        cTok.setTokenIndex(oTok.getTokenIndex());

        // create a new tree node with the 'cTok' as token
        CommonTree copyChild = new CommonTree(cTok);

        // set the parent node of the child node
        copyChild.setParent(copy);

        // add the child to the parent node
        copy.addChild(copyChild);

        // make a recursive call to copy deeper
        copyTreeRecursive(copyChild, originalChild);
      }
    }
  }
}

如果您现在运行主类:

java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

或Windows

java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main

您将看到两棵以DOT代码打印的树:

you will see two trees being printed in DOT-code:

请注意，这只是我整理的一个快速技巧:事情可以整理得更多，并且ppMethod规则中的returns [String name]不太好.它也不能满足以下方法:

Note that this is just a quick hack I put together: things could be tidied up a bit more, and the returns [String name] in the ppMethod rule isn't pretty. It also does not cater for methods like this:

A foo<A,B> (A a, B b) {
  return A ~ B 
}

foo<int,string>(42, 'foo')

，也许还有更多东西.我将它留给您，希望您可以通过上面的小演示进一步了解.

and perhaps many more things. I'll leave that for you, hoping you might get further with the little demo above.

这篇关于ANTLR:如何使用树语法替换子树中的特定节点?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ANTLR:如何使用树语法替换子树中的特定节点? [英] ANTLR: How to replace specific nodes in a subtree using a Tree grammar?

问题描述

推荐答案

文件:Main.java

或Windows

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ANTLR:如何使用树语法替换子树中的特定节点? [英] ANTLR: How to replace specific nodes in a subtree using a Tree grammar?

问题描述

推荐答案

文件:Main.java

或Windows

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭