使用Java CUP解析树生成 [英] Parse tree generation with Java CUP

查看:167
本文介绍了使用Java CUP解析树生成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将CUP与JFlex一起使用以验证表达式语法.我具有基本的功能:可以判断一个表达式是否有效.

I am using CUP with JFlex to validate expression syntax. I have the basic functionality working: I can tell if an expression is valid or not.

下一步是实现简单的算术运算,例如加1".例如,如果我的表达式为"1 + a",则结果应为"2 + a".我需要访问解析树来执行此操作,因为仅标识数字项就不会这样做:将1加到(1 + a)* b"的结果应为(1 + a)* b + 1" ,而不是((2 + a)* b)".

Next step is to implement simple arithmetic operations, such as "add 1". For example, if my expression is "1 + a", the result should be "2 + a". I need access to parse tree to do that, because simply identifying a numeric term won't do it: the result of adding 1 to "(1 + a) * b" should be "(1 + a) * b + 1", not "(2 + a) * b".

有人有生成解析树的CUP示例吗?我想我可以从那里拿走它.

Does anyone have a CUP example that generates a parse tree? I think I will be able to take it from there.

作为额外的好处,是否有一种方法可以使用JFlex获取表达式中所有标记的列表?看起来像一个典型的用例,但是我不知道该怎么做.

As an added bonus, is there a way to get a list of all tokens in expression using JFlex? Seems like a typical use case, but I cannot figure out how to do it.

编辑:找到了有关堆栈溢出的有希望的线索: 从解析器创建抽象树问题

Found a promising clue on stack overflow: Create abstract tree problem from parser

讨论CUP和AST:

http://pages.cs. wisc.edu/~fischer/cs536.s08/lectures/Lecture16.4up.pdf

具体来说,本段:

解析器返回的符号与语法的开头相关联 符号,并包含整个源程序的AST

The Symbol returned by the parser is associated with the grammar’s start symbol and contains the AST for the whole source program

这无济于事.如果Symbol类没有任何指向其子级的导航指针,则如何遍历给定的 Symbol 实例树?换句话说,它看起来或不像树节点:

This does not help. How to traverse the tree given Symbol instance, if Symbol class does not have any navigation pointers to its children? In other words, it does not look or behave like a tree node:

package java_cup.runtime;
/**
 * Defines the Symbol class, which is used to represent all terminals
 * and nonterminals while parsing.  The lexer should pass CUP Symbols 
 * and CUP returns a Symbol.
 *
 * @version last updated: 7/3/96
 * @author  Frank Flannery
 */

/* ****************************************************************
  Class Symbol
  what the parser expects to receive from the lexer. 
  the token is identified as follows:
  sym:    the symbol type
  parse_state: the parse state.
  value:  is the lexical value of type Object
  left :  is the left position in the original input file
  right:  is the right position in the original input file
******************************************************************/

public class Symbol {

/*******************************
  Constructor for l,r values
 *******************************/

  public Symbol(int id, int l, int r, Object o) {
    this(id);
    left = l;
    right = r;
    value = o;
  }

/*******************************
  Constructor for no l,r values
********************************/

  public Symbol(int id, Object o) {
    this(id, -1, -1, o);
  }

/*****************************
  Constructor for no value
  ***************************/

  public Symbol(int id, int l, int r) {
    this(id, l, r, null);
  }

/***********************************
  Constructor for no value or l,r
***********************************/

  public Symbol(int sym_num) {
    this(sym_num, -1);
    left = -1;
    right = -1;
    value = null;
  }

/***********************************
  Constructor to give a start state
***********************************/
  Symbol(int sym_num, int state)
    {
      sym = sym_num;
      parse_state = state;
    }

/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/

  /** The symbol number of the terminal or non terminal being represented */
  public int sym;

  /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/

  /** The parse state to be recorded on the parse stack with this symbol.
   *  This field is for the convenience of the parser and shouldn't be 
   *  modified except by the parser. 
   */
  public int parse_state;
  /** This allows us to catch some errors caused by scanners recycling
   *  symbols.  For the use of the parser only. [CSA, 23-Jul-1999] */
  boolean used_by_parser = false;

/*******************************
  The data passed to parser
 *******************************/

  public int left, right;
  public Object value;

  /*****************************
    Printing this token out. (Override for pretty-print).
    ****************************/
  public String toString() { return "#"+sym; }
}

推荐答案

好,我明白了.但不幸的是,我无法按原样在此处发布所有代码.无论如何,我都会尝试概述解决方案,如果不清楚,请提出问题.

Ok, I got it. But unfortunately I cannot publish all my code here as-is. I will try to outline solution anyway, and please ask questions if something is not clear.

JFlex使用其自己的Symbol类.看这里: JFlex.jar/java_cup.runtime/Symbol.class

JFlex uses its own Symbol class. Look here: JFlex.jar/java_cup.runtime/Symbol.class

您将看到添加了两个构造函数:

You will see a couple of constructors added:

public Symbol(int id, Symbol left, Symbol right, Object o){
    this(id,left.left,right.right,o);
}
public Symbol(int id, Symbol left, Symbol right){
    this(id,left.left,right.right);
}

这里的键是Object o,它是Symbol的值.

The key here is Object o, which is the value of Symbol.

定义您自己的类来表示AST树节点,并定义另一个类来表示lexer令牌.当然,您可以使用相同的类,但是我发现使用不同的类来区分两者更加清楚. JFlex和CUP都将生成Java代码,并且很容易混淆令牌和节点.

Define your own class to represent an AST tree node, and another one to represent lexer token. Granted, you can use the same class, but I found it more clear to use different classes to distinguish between the two. Both JFlex and CUP will generate java code, and it is easy to get your tokens and nodes mixed-up.

然后,在parser.flex中的词法规则部分中,您想要对每个标记执行以下操作:

Then, in your parser.flex, in the lexical rules sections, you want to do something like this for each token:

{float_lit}        { return symbol(sym.NUMBER, createToken(yytext(), yycolumn)); }

对所有令牌执行此操作.您的createToken可能是这样的:

Do this for all your tokens. Your createToken could be something like this:

%{
    private LexerToken createToken(String val, int start) {
        LexerToken tk = new LexerToken(val, start);
        addToken(tk);
        return tk;
    }
}%

现在让我们进入parser.cup.将所有终端声明为LexerToken类型,并将所有非终端声明为Node类型.您想阅读CUP手册,但是为了快速复习,终端将是词法分析器可识别的任何内容(例如,数字,变量,运算符),而非终端将是语法的一部分(例如,表达式,因子,术语... ).

Now let's move on to parser.cup. Declare all your terminals to be of type LexerToken, and all your non-terminals to be of type Node. You want to read CUP manual, but for quick refresher, a terminal would be anything recognized by the lexer (e.g. numbers, variables, operators), and non-terminal would be parts of your grammar (e.g. expression, factor, term...).

最后,所有这些都在语法定义中一起出现.考虑以下示例:

Finally, this all comes together in the grammar definition. Consider the following example:

   factor    ::= factor:f TIMES:times term:t
                 {: RESULT = new Node(times.val, f, t, times.start); :}
                 |
                 factor:f DIVIDE:div term:t
                 {: RESULT = new Node(div.val, f, t, div.start); :}
                 |
                 term:t
                 {: RESULT = t; :}
                 ;

语法factor:f表示您将因子的值别名为f,您可以在以下的{: ... :}部分中引用它.请记住,我们的终端的值是LexerToken类型,而我们的非终端的值是Node s.

Syntax factor:f means you alias the factor's value to be f, and you can refer to it in the following section {: ... :}. Remember, our terminals have values of type LexerToken, and our non-terminals have values that are Nodes.

您在表达式中的术语可能具有以下定义:

Your term in expression may have the following definition:

   term  ::= LPAREN expr:e RPAREN
         {: RESULT = new Node(e.val, e.start); :}
         |
         NUMBER:n
         {: RESULT = new Node(n.val, n.start); :}
         ;

成功生成解析器代码后,您将在parser.java中看到建立节点之间父子关系的部分:

When you successfully generate the parser code, you will see in your parser.java the part where the parent-child relationship between nodes is established:

  case 16: // term ::= UFUN LPAREN expr RPAREN 
    {
      Node RESULT =null;
    int ufleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).left;
    int ufright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).right;
    LexerToken uf = (LexerToken)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-3)).value;
    int eleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).left;
    int eright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).right;
    Node e = (Node)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-1)).value;
     RESULT = new Node(uf.val, e, null, uf.start); 
      CUP$parser$result = parser.getSymbolFactory().newSymbol("term",0, ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)), ((java_cup.runtime.Symbol)CUP$parser$stack.peek()), RESULT);
    }
  return CUP$parser$result;

很抱歉,我无法发布完整的代码示例,但是希望这可以节省一些人的反复试验时间.没有完整的代码也是一件好事,因为它不会使所有这些CS作业都变得毫无用处.

I am sorry that I cannot publish complete code example, but hopefully this will save someone a few hours of trial and error. Not having complete code is also good because it won't render all those CS homework assignments useless.

作为生活的证明,这是我的AST示例的精美图片.

As a proof of life, here's a pretty-print of my sample AST.

输入表达式:

T21 + 1A / log(max(F1004036, min(a1, a2))) * MIN(1B, 434) -LOG(xyz) - -3.5+10 -.1 + .3 * (1)

产生的AST:

|--[+]
   |--[-]
   |  |--[+]
   |  |  |--[-]
   |  |  |  |--[-]
   |  |  |  |  |--[+]
   |  |  |  |  |  |--[T21]
   |  |  |  |  |  |--[*]
   |  |  |  |  |     |--[/]
   |  |  |  |  |     |  |--[1A]
   |  |  |  |  |     |  |--[LOG]
   |  |  |  |  |     |     |--[MAX]
   |  |  |  |  |     |        |--[F1004036]
   |  |  |  |  |     |        |--[MIN]
   |  |  |  |  |     |           |--[A1]
   |  |  |  |  |     |           |--[A2]
   |  |  |  |  |     |--[MIN]
   |  |  |  |  |        |--[1B]
   |  |  |  |  |        |--[434]
   |  |  |  |  |--[LOG]
   |  |  |  |     |--[XYZ]
   |  |  |  |--[-]
   |  |  |     |--[3.5]
   |  |  |--[10]
   |  |--[.1]
   |--[*]
      |--[.3]
      |--[1]

这篇关于使用Java CUP解析树生成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆