ANTLR:从 CommonTree 到有用的对象图 [英] ANTLR: From CommonTree to useful object graph

查看:36
本文介绍了ANTLR:从 CommonTree 到有用的对象图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我今天开始使用 ANTLR 并且我已经创建了一个基本的解析器.

解析后我得到了一棵树.对我来说,这似乎只是一堆 String 放在 Tree 节点的树结构中.这对我来说不是很有用.我想要一个对象图.

澄清(这是一个例子,而不是我真正的应用程序):对于 "5-1+6" 我似乎最终得到:

new String("PLUS")新字符串(减号")新字符串(5")新字符串(1")新字符串(6")

我认为更有用的内容:

new Plus(新减号(新的整数文字(5),新的整数文字(1)),新的整数文字(6))

从第一个表示到另一个表示最方便的方式是什么?在这篇文章中,作者做类似的事情:

public 表达式 createExpr(CommonTree ast) {//...开关(ast.getType()){案例 SimpleExpressionParser.INT:返回新的 IntegerLiteral(ast.getText())案例 SimpleExpressionParser.PLUS:return new Plus(createExpr((CommonTree)ast.getChild(0)),//递归createExpr((CommonTree)ast.getChild(1)));//递归案例 SimpleExpressionParser.MINUS:return new Minus(createExpr((CommonTree)ast.getChild(0)),//递归createExpr((CommonTree)ast.getChild(1)));//递归}//...}

这是首选方式吗?!我不能指示 ANTLR 以某种方式生成这个样板代码(它会很大)吗?

<小时>

可能相关的问题:

解决方案

这是一种可能的方法.简而言之,这些是您要执行的步骤:

  1. 创建一个组合语法来生成词法分析器和解析器;
  2. 在 (1) 中的语法中混合 AST 重写规则,将标记的平面列表转换为适当的树;
  3. 写一个可以从(2)中遍历树的树语法;
  4. 在 tree walker 中混合自定义代码;
  5. 测试一下.

1

让我们创建一个支持+-*/(...) 和数字,可能如下所示:

grammar Exp;//文件:Exp.g评估: exp EOF;经验值: addExp;添加Exp: mulExp ((Add | Sub) mulExp)*;倍数: unaryExp ((Mul | Div) unaryExp)*;一元Exp: 子原子|原子;原子:  数字|'('exp')';添加:'+';子:'-';穆尔:'*';div : '/';数字:'0'..'9'+;空格 : ' ' {skip();};

2

包括重写规则,它看起来像:

grammar Exp;//文件:Exp.g选项 {输出=AST;}令牌{U_SUB;}评估: exp EOF ->经验值;经验值: addExp;添加Exp: mulExp ((Add | Sub)^ mulExp)*;倍数: unaryExp ((Mul | Div)^ unaryExp)*;一元Exp:子原子 ->^(U_SUB 原子)|原子;原子:  数字|'('exp')' ->经验值;添加:'+';子:'-';穆尔:'*';div : '/';数字:'0'..'9'+;空格 : ' ' {skip();};

现在像 10 - 2 * (3 + 8) 这样的表达式将被转换为:

3

要创建一个树语法来为 (2) 中生成的 AST 生成迭代器,您需要执行以下操作:

树语法ExpWalker;//文件:ExpWalker.g选项 {tokenVocab=Exp;//使用来自 Exp.g 的令牌ASTLabelType=普通树;}评估: 经验;经验值: ^(添加exp exp)|^(子exp exp)|^(Mul exp exp)|^(div exp exp)|^(U_SUB exp)|数字;

4

并且要在此树迭代器中混合自定义类,请执行以下操作:

树语法ExpWalker;//文件:ExpWalker.g选项 {tokenVocab=Exp;//使用来自 Exp.g 的令牌ASTLabelType=普通树;}eval 返回 [ExpNode e]: exp {e = $exp.e;};exp 返回 [ExpNode e]: ^(Add a=exp b=exp) {e = new AddExp($a.e, $b.e);}|^(Sub a=exp b=exp) {e = new SubExp($a.e, $b.e);}|^(Mul a=exp b=exp) {e = new MulExp($a.e, $b.e);}|^(Div a=exp b=exp) {e = new DivExp($a.e, $b.e);}|^(U_SUB a=exp) {e = new UnaryExp($a.e);}|Number {e = new NumberExp($Number.text);};

5

这里有一些代码来测试所有的类(把它们都放在一个文件中:Main.java):

import org.antlr.runtime.*;导入 org.antlr.runtime.tree.*;导入 org.antlr.stringtemplate.*;公共课主要{public static void main(String[] args) 抛出异常 {字符串源 = "10 - 2 * (3 + 8)";ExpLexer 词法分析器 = new ExpLexer(new ANTLRStringStream(source));CommonTokenStream 令牌 = new CommonTokenStream(lexer);ExpParser parser = new ExpParser(tokens);ExpParser.eval_return returnValue = parser.eval();CommonTree 树 = (CommonTree)returnValue.getTree();CommonTreeNodeStream 节点 = new CommonTreeNodeStream(tree);ExpWalker walker = new ExpWalker(nodes);ExpNode root = walker.eval();System.out.println(source + " = " + root.evaluate());}}接口扩展节点{双评估();}类 NumberExp 实现了 ExpNode {最终双数;NumberExp(String s) {num = Double.parseDouble(s);}@覆盖公共双重评估(){返回编号;}}类 AddExp 实现 ExpNode {最终 ExpNode 左、右;AddExp(ExpNode a, ExpNode b) {左 = 一个;右 = b;}@覆盖公共双重评估(){返回 left.evaluate() + right.evaluate();}}类 SubExp 实现 ExpNode {最终 ExpNode 左、右;SubExp(ExpNode a, ExpNode b) {左 = 一个;右 = b;}@覆盖公共双重评估(){返回 left.evaluate() - right.evaluate();}}类 MulExp 实现 ExpNode {最终 ExpNode 左、右;MulExp(ExpNode a, ExpNode b) {左 = 一个;右 = b;}@覆盖公共双重评估(){返回 left.evaluate() * right.evaluate();}}类 DivExp 实现了 ExpNode {最终 ExpNode 左、右;DivExp(ExpNode a, ExpNode b) {左 = 一个;右 = b;}@覆盖公共双重评估(){返回 left.evaluate()/right.evaluate();}}类 UnaryExp 实现 ExpNode {最终的ExpNode exp;UnaryExp(ExpNode e) {exp = e;}@覆盖公共双重评估(){返回 -exp.evaluate();}}

然后做:

<前># 生成词法分析器java -cp antlr-3.2.jar org.antlr.Tool Exp.g# 生成树行走器java -cp antlr-3.2.jar org.antlr.Tool ExpWalker.g# 编译一切javac -cp antlr-3.2.jar *.java# 运行主类java -cp .:antlr-3.2.jar Main # *nixjava -cp .;antlr-3.2.jar Main # Windows

打印:

<前>10 - 2 * (3 + 8) = -12.0

您可以跳过 tree-walker 并在组合语法中混合所有代码和 returns [...],但是 IMO,树语法使事情更加有序,因为词法分析器规则,并且像 () 之类的标记被从其中删除.

HTH

I started using ANTLR today and I've created a basic parser.

After parsing I end up with a tree. To me it seems like this is just a bunch of Strings put together in a tree structure of Tree-nodes. That's not very useful to me. I'd like to have a graph of objects.

To clarify (this is an example, and not my real application): For "5-1+6" I seem to end up with:

new String("PLUS")
    new String("MINUS")
        new String("5")
        new String("1")
    new String("6")

What I would find more useful:

new Plus(
    new Minus(
        new IntegerLiteral(5),
        new IntegerLiteral(1)),
    new IntegerLiteral(6))

What is the most convenient way of going from the first representation to the other? In this article the author does something similar to this:

public Expression createExpr(CommonTree ast) {

    // ...

    switch (ast.getType()) {
    case SimpleExpressionParser.INT:
        return new IntegerLiteral(ast.getText())
    case SimpleExpressionParser.PLUS:
        return new Plus(createExpr((CommonTree)ast.getChild(0)),    // recurse
                        createExpr((CommonTree)ast.getChild(1)));   // recurse
    case SimpleExpressionParser.MINUS:
        return new Minus(createExpr((CommonTree)ast.getChild(0)),   // recurse
                         createExpr((CommonTree)ast.getChild(1)));  // recurse
    }

    // ...
}

Is this the preferred way?! Can't I instruct ANTLR to generate this boiler-plate code somehow (it will be huge)?


Possibly related questions:

解决方案

Here's a a possible way. In short, these are the step you'd perform:

  1. create a combined grammar that will generate the lexer and parser;
  2. mix AST rewrite rules in the grammar from (1) to transform the flat list of tokens into a proper tree;
  3. write a tree grammar that can walk the tree from (2);
  4. mix custom code inside your tree walker;
  5. test it.

1

Let's create a small expression parser supporting +, -, *, /, (...) and numbers, which could look like:

grammar Exp; // file: Exp.g

eval
  :  exp EOF
  ;

exp 
  :  addExp 
  ;

addExp
  :  mulExp ((Add | Sub) mulExp)*
  ;

mulExp
  :  unaryExp ((Mul | Div) unaryExp)*
  ;

unaryExp
  :  Sub atom
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')'
  ;

Add    : '+';
Sub    : '-';
Mul    : '*';
Div    : '/';
Number : '0'..'9'+;
Space  : ' ' {skip();};

2

Including rewrite rules, it will look like:

grammar Exp; // file: Exp.g

options {
  output=AST;
}

tokens {
  U_SUB;
}

eval 
  :  exp EOF -> exp
  ;

exp 
  :  addExp 
  ;

addExp
  :  mulExp ((Add | Sub)^ mulExp)*
  ;

mulExp
  :  unaryExp ((Mul | Div)^ unaryExp)*
  ;

unaryExp
  :  Sub atom -> ^(U_SUB atom)
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Add    : '+';
Sub    : '-';
Mul    : '*';
Div    : '/';
Number : '0'..'9'+;
Space  : ' ' {skip();};

Now an expression like 10 - 2 * (3 + 8) will be transformed to:

3

To create a tree grammar that generates an iterator for the AST generated in (2), you'd do something like this:

tree grammar ExpWalker; // file: ExpWalker.g

options {
  tokenVocab=Exp; // use the tokens from Exp.g
  ASTLabelType=CommonTree;
}

eval
  :  exp
  ;

exp
  :  ^(Add exp exp)
  |  ^(Sub exp exp)
  |  ^(Mul exp exp)
  |  ^(Div exp exp)
  |  ^(U_SUB exp)
  |  Number
  ;

4

And to mix your custom classes in this tree iterator, do something like this:

tree grammar ExpWalker; // file: ExpWalker.g

options {
  tokenVocab=Exp; // use the tokens from Exp.g
  ASTLabelType=CommonTree;
}

eval returns [ExpNode e]
  :  exp {e = $exp.e;}
  ;

exp returns [ExpNode e]
  :  ^(Add a=exp b=exp) {e = new AddExp($a.e, $b.e);}
  |  ^(Sub a=exp b=exp) {e = new SubExp($a.e, $b.e);}
  |  ^(Mul a=exp b=exp) {e = new MulExp($a.e, $b.e);}
  |  ^(Div a=exp b=exp) {e = new DivExp($a.e, $b.e);}
  |  ^(U_SUB a=exp)     {e = new UnaryExp($a.e);}
  |  Number             {e = new NumberExp($Number.text);}
  ;

5

Here's some code to test all classes (stick it all just in one file: Main.java):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "10 - 2 * (3 + 8)";
    ExpLexer lexer = new ExpLexer(new ANTLRStringStream(source));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    ExpParser parser = new ExpParser(tokens);
    ExpParser.eval_return returnValue = parser.eval();
    CommonTree tree = (CommonTree)returnValue.getTree();
    CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
    ExpWalker walker = new ExpWalker(nodes);
    ExpNode root = walker.eval();
    System.out.println(source + " = " + root.evaluate());
  }
}

interface ExpNode {
  double evaluate();
}

class NumberExp implements ExpNode {

  final double num;

  NumberExp(String s) {
    num = Double.parseDouble(s);
  }

  @Override
  public double evaluate() {
    return num;
  }
}

class AddExp implements ExpNode {

  final ExpNode left, right;

  AddExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() + right.evaluate();
  }
}

class SubExp implements ExpNode {

  final ExpNode left, right;

  SubExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() - right.evaluate();
  }
}

class MulExp implements ExpNode {

  final ExpNode left, right;

  MulExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() * right.evaluate();
  }
}

class DivExp implements ExpNode {

  final ExpNode left, right;

  DivExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() / right.evaluate();
  }
}

class UnaryExp implements ExpNode {

  final ExpNode exp;

  UnaryExp(ExpNode e) {
    exp = e;
  }

  @Override
  public double evaluate() {
    return -exp.evaluate();
  }
}

and then do:

# generate a lexer & parser
java -cp antlr-3.2.jar org.antlr.Tool Exp.g

# generate the tree walker
java -cp antlr-3.2.jar org.antlr.Tool ExpWalker.g

# compile everything
javac -cp antlr-3.2.jar *.java

# run the main class
java -cp .:antlr-3.2.jar Main         # *nix 
java -cp .;antlr-3.2.jar Main         # Windows

which prints:

10 - 2 * (3 + 8) = -12.0

You could skip the tree-walker and mix all the code and returns [...] inside your combined grammar, but IMO, a tree grammar keeps things more orderly because the lexer rules, and tokens like ( and ) etc. are removed from it.

HTH

这篇关于ANTLR:从 CommonTree 到有用的对象图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆