抽象语法C ++中的树表示 [英] Abstract Syntax Tree representation in C++

查看:101
本文介绍了抽象语法C ++中的树表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经有了tokenizer接口,它创建了一个令牌列表。我有解析器的工作机制。它真的是独一无二的,像一个魅力。我唯一的错误是AST的基本结构。如何在抽象层上表示树,节点和语句。我不需要任何实现只是一个快速的想法如何应该在类层次结构?
我正在开发一种面向对象的语言。是的,我已经意识到,我将需要两种类型的语句。一些值返回expression类型语句和一个非返回,指令流控制类型语句。

如果你的语言是命令式的/ c-like,常见的情况从顶部开始 - 层次结构分为2种超类型:




  • 表达式

  • 语句



程序是语句列表,它是一个语句本身。



对于扩展语句基类的语句类型有一个类。



典型的情况如下:




  • 语句块(语句列表)

  • ite(if then else)

  • 初始化语句列表,检查表达式,增量语句和块

  • while(类似,但只检查表达式

  • 变量声明

  • 赋值(包括+ = - = ++ - ,可以在一个类中包含一个运算符字段,一个lval和一个rval)

  • 函数调用一个)



对于表达式:




  • Bop(二进制操作,任何具有2个操作数和1个运算符,即+ - * /%| & &&& || ==<

  • Uop(一元操作,任何具有1个操作数和1个运算符的函数,即〜!)

  • 函数调用)

  • 条件表达式(exp?true val:false val)



具有这两个抽象(表达式和语句)是在所有的类中,你将有抽象类型,并将能够访问具有访问者模式的AST。



例如,一些类看起来像这样(伪代码):

  class Ite extends Statement {
Expression条件;
语句ifBranch;
语句elseBranch;
}


类Bop扩展表达式{
BOperator operator; // +, - 。 * or whatever
Expression left; //左操作数
表达式右; // Right operand
}


class StatementBlock extends语句{
List< Statement>语句;
}


类赋值语句{
AOperator assignOp; // = + = - = etc.
LVal lvalue; //左值不能是任意表达式,你通常会有一个特定的类型
表达式右值; //正确的值
}

类型(对于AST,只是静态类型就足够了,如果你的项目实现一些后端,你也需要一些动态类型)。



静态类型通常可以使用一些枚举来指定,如果您不打算支持需要大小信息的固定大小的数组。如果你想要固定大小的数组,你可以为类型实现一个类,并且数组类型保存额外的大小信息。

  enum Type {
CHAR,
SHORT,
INT,
LONG,
FLOAT,
DOUBLE,
ARRAY
}

类Float扩展StaticType {
final类型type = Type.FLOAT;
}

类继承StaticArray {
final Type type = Type.ARRAY;

int size;
}

然后,您将为AST中的每个类型实例化一个StaticType实例,例如当用户声明一个变量。



如果您计划在未来进行静态类型检查,您将能够使用相同的层次结构。对于以AST形式运行/解释代码,您将需要一个内存,它将保存包含运行时内存信息的堆栈/堆栈。此时,您将需要存储值及其类型信息。


I already have the tokenizer interface which creates a list of tokens. I have the working mechanism for the parser. It's really unique and works like a charm. The only thing that i miss is the basic structure of the AST. How the tree, the nodes and the statements should be represented on abstraction level. I don't need any implementation only a quick idea how should it look like in class hierarchy? I'm working on an object-oriented language. Yeah, i already realized that i will need two types of statements. Some value returning "expression" type statement and a non-returning, instruction flow controlling type statement. Thank you very much.

解决方案

If your language is imperative/c-like, the common scenario begins with top-hierarchy being split in 2 supertypes:

  • Expression
  • Statement

The program is a list of statement, which is a statement itself.

You will probably want to have one class for type of statement that extends the statement base class.

A typical scenario looks like:

  • statement block ( a list of statements )
  • ite (if then else)
  • for (a for loop with its initialization statements list, check expression, increment statements, and block
  • while (similar, but only check expression
  • variable declaration
  • assignment (including += -= ++ --, you can wrap all in one class with an operator field, a lval and an rval)
  • Function call (void one)

For the expressions:

  • Bop (binary operation, anything that has 2 operands and 1 operator i.e. + - * / % | & && || == <
  • Uop (unary operation, anything that has 1 operand and 1 operator i.e. ~ !)
  • Function call (not-void ones)
  • Conditional expression ( exp ? true val : false val )

The good thing of having this 2 abstractions (expressions and statements) is that inside all your classes, you will have abstract types, and will be able to visit the AST with the visitor pattern, for example.

For example, some classes would look like this (pseudocode):

class Ite extends Statement {
   Expression condition;
   Statement ifBranch;
   Statement elseBranch;
}


class Bop extends Expression {
   BOperator operator;  // +, -. * or whatever
   Expression left;     // Left operand
   Expression right;    // Right operand
}


class StatementBlock extends Statement {
   List<Statement> statements;
}


class Assignment extends Statement {
   AOperator assignOp;  // = += -= etc.
   LVal lvalue;         // The lvalue cannot be an arbitrary expression, you will usually have a specific type for it
   Expression rvalue;   // Right value
}

Also, you will need some way to represent types (for the AST, just static types are enough, if you project to implement some back-end as well, you will need some dynamic types too).

Static types can usually be specified with some enumerations, if you don't plan to support fixed-size arrays which need a size information. If you want fixed-size arrays with, size, you can implement one class for type and have the array type hold additional size information.

enum Type {
   CHAR,
   SHORT,
   INT,
   LONG,
   FLOAT,
   DOUBLE,
   ARRAY
}

class Float extends StaticType {
    final Type type = Type.FLOAT;
}

class Array extends StaticArray {
    final Type type = Type.ARRAY;

    int size;
}

You will then instantiate one StaticType instance for every type in the AST, for example when the user declares a variable. You will be able to use the same hierarchy if you plan to do static type-checking in the future, also.

As for running/interpreting the code in the AST form, you will need a Memory which will hold a stack/heap containing information about runtime memory. At that point you will need to store values together with their type information.

这篇关于抽象语法C ++中的树表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆