再$ P $用C psenting抽象语法树 [英] Representing an Abstract Syntax Tree in C

查看:145
本文介绍了再$ P $用C psenting抽象语法树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我执行的C.一个简单的玩具语言,我有一个工作扫描器和分析器,并在概念上的功能/结构的AST的合理背景的编译器。我的问题是有关重新present的具体方式C.一个AST我遇到三种风格pretty频频在不同的文本/资源在线:

I'm implementing a compiler for a simple toy language in C. I have a working scanner and parser, and a reasonable background on the conceptual function/construction of an AST. My question is related to the specific way to represent an AST in C. I've come across three styles pretty frequently in different texts/resources online:

每类节点的一个结构。

这有一个基本节点类(结构),它是在所有的子结构的第一个字段。基节点包含一个存储节点(不变,双目运算符,分配等)的类型枚举。该结构的成员使用一组宏的访问,每个结构一组。它看起来是这样的:

This has a base node "class"(struct) that is the first field in all the child structs. The base node contains an enum that stores the type of node(constant, binary operator, assignment, etc). Members of the struct are accessed using a set of macros, with one set per struct. It looks something like this:

struct ast_node_base {
    enum {CONSTANT, ADD, SUB, ASSIGNMENT} class;
};

struct ast_node_constant {
    struct ast_node_base *base;
    int value;
};

struct ast_node_add {
    struct ast_node_base *base;
    struct ast_node_base *left;
    struct ast_node_base *right;
};

struct ast_node_assign {
    struct ast_node_base *base;
    struct ast_node_base *left;
    struct ast_node_base *right;
};

#define CLASS(node) ((ast_node_base*)node)->class;

#define ADD_LEFT(node) ((ast_node_add*)node)->left;
#define ADD_RIGHT(node) ((ast_node_add*)node)->right;

#define ASSIGN_LEFT(node) ((ast_node_assign*)node)->left;
#define ASSIGN_RIGHT(node) ((ast_node_assign*)node)->right;

每个节点的布局结构之一。

此似乎是大多相同的上述布局,除了代替具有ast_node_add和ast_node_assign它将有一个ast_node_binary既重新present,因为两个结构的布局是相同的,它们只能通过不同基线>类的内容。本的优点似乎是更均匀的组宏(左(节点)的所有与左和右,而不是一对每宏的节点),但缺点似乎的C类型检查不会有用(就没有办法检测ast_node_assign那里只能是一个ast_node_add,例如)。

This appears to be mostly the same as the above layout, except instead of having ast_node_add and ast_node_assign it would have an ast_node_binary to represent both, because the layout of the two structs is the same and they only differ by the contents of base->class. The advantage to this seems to be a more uniform set of macros(LEFT(node) for all nodes with a left and right instead of one pair of macros per), but the disadvantage seems that the C type checking won't be as useful(there would be no way to detect an ast_node_assign where there should only be an ast_node_add, for example).

一总结构,与工会持有不同类型的节点的数据。

这不是我可以给一个更好的解释可以在这里找到 。从previous例如,使用类型它看起来像:

A better explanation of this than I can give can be found here. Using the types from the previous example it would look like:

struct ast_node {
  enum { CONSTANT, ADD, SUB, ASSIGNMENT } class;
  union { int                                 value;
          struct { struct ast_node* left;    
                   struct ast_node* right;  } op;
};

我倾向于喜欢第三个选项最,因为它使递归遍历更容易(在很多指针铸造避免有利于联盟),但它也没有利用的C类型检查。第一个选项似乎是最危险的,因为它依赖于指针结构正在施放访问任何节点(甚至不同需要不同的情况下访问(基地与左)在同一节点的成员)的成员,但这些强制转换有型选中状态,以便可能是没有实际意义。第二个选择我似乎是最坏的两个世界,但也许我失去了一些东西。

I'm inclined to like the third option the most because it makes recursive traversal much easier(in that lots of pointer casting is avoided in favor of the union), but it also doesn't take advantage of C type checking. The first option seems the most dangerous in that it relies on pointers to structs being cast to access the member of any node(even different members of the same node requiring different cases to access(base vs. left)), but these casts are type checked so that might be moot. The second option to me seems like the worst of both worlds, although maybe I'm missing something.

这三方案是最好的,为什么?有没有更好的第四个选项我还没有碰到过吗?我假设他们都不是一个一刀切的解决方案,因此,如果它的事项我实施的语言是一种静态类型势在必行语言,几乎的C一小部分。

Which of these three schemes are the best, and why? Is there a better fourth option I haven't come across yet? I'm assuming none of them are a "one size fits all" solution, so if it matters the language I'm implementing is a statically typed imperative language, almost a small subset of C.

一个具体的问题我有关于第三个(工会)的布局。 如果我只用值字段,会有以下,以适应操作的可能性值写入空?

A specific question I have about the third(union) layout. If I use only the value field, will there be empty space following the value to accommodate for the possibility of op being written to?

推荐答案

您可以使这些工作。

我preFER工会布局,因为那时所有节点都具有相同的布局。

I prefer the union layout, because then all nodes have "the same" layout.

[您可能会发现有用的一个子子列表选项,例如,儿童arbitarily大,动态数组,而不必向左或向右倾斜的名单。]

[You may find it useful to have a "child sublist" option, e.g., and arbitarily big, dynamic array of children, instead of having left- or right-leaning lists.]

您会发现,这个问题是不是一个让你构建编译器的硬盘。相反,它是具有符号表,执行各种分析,选择一个机器级红外,构建code发生器,和做code的优化。然后,你会遇到真正的用户,你会发现你真的做错了: - }

You are going to find that this issue isn't the one that makes building your compiler hard. Rather, it is having symbol tables, performing various kinds of analyses, choosing a machine-level IR, building a code generator, and doing code optimizations. Then you're going to encounter real users and you'll discover what you really did wrong :-}

我会选择一个,并运行它,让你有机会得到附近的其它问题。

I'd pick one and run with it, so that you have a chance to get near the other issues.

这篇关于再$ P $用C psenting抽象语法树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆