解析后的符号表人口;编译器建设 [英] Symbol Table Population after parsing; Compiler building

查看:135
本文介绍了解析后的符号表人口;编译器建设的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

创建解析树后,我现在来填充符号表。

我要存储信息,如

类型,范围,对于标识符偏移等。

现在我怎么知道是什么类型,标识符的范围,因为我所知道的是该特定ID词位值和行号(词法分析后)。

我如何得到对整个事情。谢谢。


解决方案

  

现在我怎么知道是什么类型,标识符的范围,因为所有的我
  知道的是语义值和行号为特定ID(后
  词法分析)。


由于EJP提到的,你需要通过一步解析树。你的树应该已经创建,这样你可以做一个中序遍历,参观在其源$ C ​​$ C语句和前pressions被评估的顺序相同的每个节点。你的树节点也应该用具体的语言结构相对应,例如, WhileStmtNode MethodDeclNode

假如我建立符号表,通过树递归步进,而我刚刚进入了一个方法体节点。我可能有类似以下内容:

 公共无效doAction(MethodBodyNode methodBody){
    currScope = 2;
    。methodBody.getExpr()applyAction(本);
    currScope = 2;
}

我保持了一个全局变量来管理的范围。每次我进入一个块,其中范围的变化,我会增加 currScope 。同样,我会保持 currClass currMethod 变量来存储与符号名称,类型,偏移等进行后期阶段。

更新:


  

现在说,我遍历树,每次我遇到了我的ID
  将具有与类型一起输入值到符号表中,
  范围和别人,譬如说我的范围检查,如果我遇到'{'或
  函数的名字,但我怎么知道是什么类型的ID是这样的。


每个树节点应包含所有用于整个构建体中的必要的信息。如果您使用解析器生成,如杯或野牛,您可以指定如何构建树文法操作。例如。

  variableDeclaration :: =标识符:i标识符:I2 SEMICOLON {:RESULT =新VarDeclNode(I,I2,NULL); :};
标识符:: = ID:I {:RESULT =新则idNode(i.getLineNum(),i.getCharNum(),i.getStringValue()); :};

这些作品将匹配富F; 和变量声明节点添加到树。该节点封装包含行号,字符数,以及所述词位的串值的两个标识符的节点。第一标识符节点的类型和第二个是变量名。 ID 是在匹配的标识符由词法分析器返回的终结符。我假设你熟悉这在一定程度上。

 公共类VarDeclNode扩展StmtNode {    私人则idNode ID;
    私人则idNode类型;
    私人ExprNode EXPR;    公共VarDeclNode(则idNode ID,则idNode类型,ExprNode表达式){
        超();
        this.id = ID;
        this.type =类型;
        this.expr = EXPR;
    }}

当你有一个语法树,像这样的节点,你有你需要的所有信息。

2日更新:

无论您使用的是自定义分析器或生成的一个不要紧,有一个明显的地步,你在匹配一个生产节点添加到树。而且这不要紧,你所使用的语言。 C结构会做就好了。


  

,如果它是一个非终端具有信息作为非终结符名,并且如果它
  是终端,即一个令牌,然后令牌即语义价值的信息,
  令牌名和行号存储


您必须在树中,例如专门的节点ClassNode,TypeNode,MethodDeclNode,IfStmtNode,ExprNode。你不能只存储一个类型的节点,并把非终端和终端在里面。非终端重新psented作为树结$ P $,没有其他信息,它存储组成它的部分,一般都是非终端本身旁边。你不会存储任何令牌信息。只有但也有少数情况下,您会实际存储一语义的字符串值:一个标识符和串/布尔/整型常量。

看一看这个例子。在第一次减持时取值降低到(S + F),你追加一个 ParenExprNode 树根。您还追加 AddExprNode ParenExprNode 的孩子。由语法规则2应用减少时逻辑必须硬codeD插入解析器。

树:

  ExprNode(根)
       |
  ParenExprNode
       |
   AddExprNode
   / \\
ExprNode ExprNode

在code:

 结构ExprNode {void *的exprNode; };
结构ParenExprNode {void *的exprNode; };
结构AddExprNode {void *的OP1,OP2 *; };
结构则idNode {字符* VAL; INT线; INT charNum; };
结构IntLiteralNode {INT VAL; INT线; INT charNum; };无效reduce_rule_2(ExprNode *表达式){    //删除堆栈符号    //创建节点
    结构ParenExprNode * parenExpr =的malloc(sizeof的(结构ParenExprNode));
    结构AddExprNode * addExpr =的malloc(sizeof的(结构AddExprNode));
    addExpr-> OP1 =的malloc(sizeof的(结构ExprNode));
    addExpr-> OP2 =的malloc(sizeof的(结构ExprNode));    //将它们链接
    parenExpr-> exprNode =(无效*)addExpr;
    expr-> exprNode =(无效*)parenExpr;
}

在接下来的步骤中,左括号从输入移除。随后,取值是在堆栈的顶部,它是由规则1.降低到˚F由于˚F是非终端的标识符,它是由则idNode

树:

  ExprNode
       |
  ParenExprNode
       |
   AddExprNode
   / \\
ExprNode ExprNode
   |
 则idNode

在code:

  reduce_rule_2(addExpr-> OP1);无效reduce_rule_1(ExprNode *表达式){
    //减少堆栈符号
    结构则idNode * ID =的malloc(sizeof的(结构则idNode));
    ID-GT&; VAL = parser_matched_text();
    ID-GT&; LINENUM = parser_line_num();
    ID-GT&; charNum = parser_char_num();
    expr-> exprNode =(无效*)ID;
}

等等......

After creating the parse tree, i have to populate symbol table now.

I have to store information like

Type, Scope, Offset etc for the identifiers.

Now how do i know the type, scope of the identifiers , since all i know is the lexeme value and line number for that particular ID (after lexical analysis).

How do i got about the whole thing. thanks.

解决方案

Now how do i know the type, scope of the identifiers , since all i know is the lexeme value and line number for that particular ID (after lexical analysis).

As EJP mentioned, you need to step through the parse tree. Your tree should have been created so that you can do an in-order traversal, visiting each node in the same order in which source code statements and expressions are evaluated. Your tree nodes should also correspond with a specific language construct, e.g. WhileStmtNode, MethodDeclNode, etc.

Suppose I am building the symbol table, recursively stepping through the tree, and I've just entered a method body node. I might have something like the following:

public void doAction(MethodBodyNode methodBody) {
    currScope = 2;
    methodBody.getExpr().applyAction(this);
    currScope = 2;
}

I keep a global variable to manage the scope. Every time I enter a block where the scope changes, I'd increment currScope. Similarly, I'd maintain currClass and currMethod variables to store with the symbol name, type, offset, etc. for later phases.

Update:

Now say, i am traversing the tree, everytime i come across an ID i would have to enter the value to symbol table along with the type, scope and others, say for scope i check if i come across '{' or function name, but how do i know what type of ID is this.

Each tree node should contain all the necessary information for the entire construct. If you're using a parser generator, like CUP or Bison, you can specify how to build the tree in the grammar actions. E.g.

variableDeclaration::= identifier:i identifier:i2 SEMICOLON {: RESULT = new VarDeclNode(i, i2, null); :};
identifier::= ID:i {: RESULT = new IdNode(i.getLineNum(), i.getCharNum(), i.getStringValue()); :};

Those productions would match Foo f; and append a variable declaration node to the tree. That node encapsulates two identifier nodes that contain the line number, character number, and string value of the lexeme. The first identifier node is the type and the second is the variable name. ID is a terminal symbol returned by the lexer upon matching an identifier. I'm assuming you are familiar with this to some extent.

public class VarDeclNode extends StmtNode {

    private IdNode id;
    private IdNode type;
    private ExprNode expr;

    public VarDeclNode(IdNode id, IdNode type, ExprNode expr) {
        super();
        this.id = id;
        this.type = type;
        this.expr = expr;
    }

}

When you've got a syntax tree with nodes like this, you've got all the information you need.

2nd Update:

It doesn't matter whether you're using a custom parser or a generated one, there is one distinct point where you add a node into the tree upon matching a production. And it doesn't matter what language you're using. C structs will do just fine.

if it is a non terminal has the info as Nonterminals name, and if it is a terminal i.e. a token, then the info in token i.e. lexeme value, token name and line number are stored

You must have specialized nodes in the tree, e.g. ClassNode, TypeNode, MethodDeclNode, IfStmtNode, ExprNode. You can't just store one type of node and put non-terminals and terminals in it. A non-terminal is represented as a tree node, there's no other information to store about it beside the parts that compose it, which are generally non-terminals themselves. You wouldn't store any token information. There are only but a few instances where you'd actually store a lexeme's string value: for an identifier and for a string/boolean/integer literal.

Have a look at this example. During the first reduction when S is reduced to (S + F), you append a ParenExprNode to the tree root. You also append a AddExprNode as child of the ParenExprNode. That logic must be hard-coded into your parser when applying a reduction by rule 2 of the grammar.

The tree:

    ExprNode (root)
       |
  ParenExprNode
       |
   AddExprNode
   /         \
ExprNode   ExprNode

The code:

struct ExprNode { void* exprNode; };
struct ParenExprNode { void* exprNode; };
struct AddExprNode { void* op1, * op2; };
struct IdNode { char* val; int line; int charNum; };
struct IntLiteralNode { int val; int line; int charNum; };

void reduce_rule_2(ExprNode* expr) {

    //remove stack symbols

    //create nodes
    struct ParenExprNode* parenExpr = malloc(sizeof(struct ParenExprNode));
    struct AddExprNode* addExpr = malloc(sizeof(struct AddExprNode));
    addExpr->op1 = malloc(sizeof(struct ExprNode));
    addExpr->op2 = malloc(sizeof(struct ExprNode));

    //link them
    parenExpr->exprNode = (void*)addExpr;
    expr->exprNode = (void*)parenExpr;
}

In the next step, the left parenthesis is removed from the input. Afterward, S is on top of the stack and it is reduced to F by rule 1. Since F is the non-terminal for an identifier, it's represented by IdNode.

The tree:

    ExprNode
       |
  ParenExprNode
       |
   AddExprNode
   /         \
ExprNode   ExprNode
   |
 IdNode

The code:

reduce_rule_2(addExpr->op1);

void reduce_rule_1(ExprNode* expr) {
    //reduce stack symbols
    struct IdNode* id = malloc(sizeof(struct IdNode));
    id->val = parser_matched_text();
    id->lineNum = parser_line_num();
    id->charNum = parser_char_num();
    expr->exprNode = (void*)id;
}

And so on...

这篇关于解析后的符号表人口;编译器建设的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆