编译器构造:显式解析树 [英] Compiler Construction: Explicit Parse trees

查看:128
本文介绍了编译器构造:显式解析树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译器如何在不构造显式解析树的情况下进行操作?显式解析树构造的优点和缺点是什么?

我知道编译器可以通过使用SDT并在解析过程中运行与之关联的语义来在没有显式解析树的情况下进行构造.但是我想知道显式解析树构造的优点和缺点.

I know that compiler can do construction without explicit parse tree by using SDT and running the semantics associated with it during parsing. But i want to know the benefits and drawbacks of the explicit parse tree construction.

推荐答案

我有点菜鸟,所以请和我保持联系... thx ...

Im a bit of a noob so please be paitient with me...thx...

但是要回答您的问题,仅在最简单的情况下,即没有前向引用并且符号仅在声明时才有效,而在整个范围内无效,才可以进行递归体面的编译(不带解析树) .

But to answer your question, a recursive-decent compilation (without a parse tree) can only be done in the simplest cases where theres no forward references and a symbol is only valid from the point of its declaration and not its entire scope.

很显然,它无法与Java之类的语言一起使用.如果有前向引用,则至少需要两遍,如果像Java中那样有重载的函数,则至少需要三遍(或者如果您知道如何在不到三遍的情况下这样做,请启发我们) .为此,我们构建了一个解析树.

Obviously it wont work with a language like java. If theres forward references, then at least two passes are required, and three passes will be needed if theres overloaded functions on top of that like there is in java (or if you know how to do it in less than three then please enlighten us). To this end we build a parse tree.

最简单的解析树节点可能看起来像这样(免责声明:这不是真实的代码).

The simplest parse tree node might look something like this (disclaimer: this is not real code).

package compiler;
import java.util.ArrayList;
import scanner.Token;
import scanner.TokenSet;

class Production
{
   Token leading;      // the first token in the production
   int productionID;   // a unique integer that identifies the production
   ArrayList<Production> childNodes;   // duh
   Production mother;  // mother node (may be null)

   public Production (Token leading, int productionID)
   {
      this.leading      = leading;
      this.productionID = productionID;
      childNodes        = new ArrayList<Production>();
   }

   public void append (Production child)   // add a new child node
   {
      child nodes.add(child);
      child.mother = this;
   }

   public abstract void build1 (TokenSet follow, TokenSet anchor);   // implements pass 1
   public abstract void build2 ....

}

但是更强大的方法是在每个产品上派生一个新的子类,并将子节点声明为字段变量.然后,我们可以消除productionID并改用instanceof检查.您甚至可以在定义符号的节点的给定子类上实现符号接口,并将该节点直接插入符号表中.并且定义嵌套作用域的产品也可以有自己的符号表(我在这里不会这样做).关键是这样可以将语法分析和语义分析都集成到语法分析树结构中,甚至可以整合到最终翻译中.唯一的缺点是那些可怕的Java接口:lol:

But a much stronger approach is to derive a new subclass on each production and declare the child nodes as field variables. We can then eliminate the productionID and use instanceof checks instead. You can even implement a symbol interface on a given subclass of nodes that defines a symbol and insert the node directly into the symbol table; and a production that defines a nested scope can have its own symbol table too (i wont do that here). The point is that this way both the syntactic and the semantic analysis can be intergrated into the parse tree structure, and even the final translation too. The only down side is those horrible java interfaces :lol:

例如,我们可以将Modula-2头文件声明为:

for example, we could declare a Modula-2 header file as:

// i wont bother with imports since this isnt real code

class DefinitionModule extends Production
{
   Identifier name;

   ArrayList<ImportClause> importClauses;
   ArrayList<ExportClause> exportClauses;
   ArrayList<Production>   itemList;   // CONST-,TYPE-, & VAR- declarators & function headers

   public DefinitionModule()   // no more productionID
   {
      super(lastTokenRead());   // always sits on DEFINITION
      importClauses = new ArrayList<ImportClause>;

   }


   // build()
   //
   // DefinitionModule ::= DEFINITION MODULE Identifier ";" {ImportClause}{ExportClause}{HeaderItem} END Identifier
   //
   // where HeaderItem ::= ConstDeclarator | TypeDeclarator | VarDeclator | ProcedureHeader.
   // Identifier, ImportClause, & ExportClause below are all derived from
   // Production, above



   public void build (TokenSet follow, TokenSet anchor)
   {
      Scanner.getToken();   // skip the DEFINITION
      Scanner.expectToken(Token.ID_MODULE);    // make sure MODULE is there & then skip it
      name = name.build(new TokenSet(Token.ID_SEMICOLON));
      expectToken(Token.ID_SEMICOLON);
      while (lastTokenRead()==Token.ID_IMPORT || lastTokenRead()==Token.ID_FROM)
      {
         ImportClause IC = new ImportClause(lastTokenRead());
         importClauses.add(IC.build(new TokenSet(Token.ID_SEMICOLON));
         Scanner.expectToken(Token.ID_SEMICOLON);
      }

      while (lastTokenRead()==Token.ID_EXPORT)
      {
         ExportClause XC = new ExportClause(lastTokenRead());
         exportClauses.add(XC.build(new TokenSet(Token.ID_SEMICOLON));
         Scanner.expectToken(Token.ID_SEMICOLON);
      }

      // etc, etc, etc
   }
}

如果您这样做的话,编译器将围绕该语言的功能而不是传统的编译器构建自身.

if you do it like that, the compiler will build itself around the features of the language rather than the traditional passes of a compiler.

祝你好运...

这篇关于编译器构造:显式解析树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆