如何修改解析语法以允许赋值和非赋值语句? [英] How to modify parsing grammar to allow assignment and non-assignment statements?

查看:91
本文介绍了如何修改解析语法以允许赋值和非赋值语句?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以问题是关于下面的语法。我正在尝试一种小型解释语言,这很有趣(我们在课堂上了解了一些编译器设计,所以我想将其带入一个新的水平,然后自己尝试一些方法)。我在尝试使非终端符号 Expr 陷入困境。

So the question is about the grammar below. I'm working on a mini-interpreted language for fun (we learned about some compiler design in class, so I want to take it to the next level and try something on my own). I'm stuck trying to make the non-terminal symbol Expr.

Statement ::= Expr SC
Expr ::=           /* I need help here */
Assign ::= Name EQUAL Expr
AddSub ::= MulDiv {(+|-) AddSub}
MulDiv ::= Primary {(*|/) MulDiv}
Primary ::= INT | FLOAT | STR | LP Expr RP | Name
Name ::= ID {. Name}

Expr 声明必须考虑以下两种情况:

Expr has to be made such that Statement must allow for the two cases:


  1. x = 789; (常规分配,后跟分号)

  2. x + 2; (无分配,只是计算,将其丢弃;然后是分号)

  1. x = 789; (regular assignment, followed by semicolon)
  2. x+2; (no assignment, just calculation, discarded; followed by a semicolon)

第二种情况的目的是为未来。我在考虑一元递增和递减运算符,以及函数调用。

The purpose of the second case is to setup the foundation for more changes in the future. I was thinking about unary increment and decrement operators, and also function calls; both of which don't require assignment to be meaningful.

我看过其他语法(即C#),但它太复杂且冗长,难以理解。自然,我不是在寻找解决方案,而只是在寻求有关如何修改语法的指导。

I've looked at other grammars (C# namely), but it was too complicated and lengthy to understand. Naturally I'm not looking for solutions, but only for guidance on how I could modify my grammar.

感谢所有帮助。

编辑:我应该说,我最初的想法是 Expr :: =分配| AddSub ,但这不起作用,因为这会产生歧义,因为两者都可以以非终结符号 Name 开头。我已经对令牌生成器进行了设计,使其可以允许一个令牌向前看(窥视),但对于非终端设备,我还没有做过这样的事情,因为它将试图解决可以避免的问题(含糊不清)。

I should say that my initial thought was Expr ::= Assign | AddSub, but that wouldn't work since it would create ambiguity since both could start with the non-terminal symbol Name. I have made my tokenizer such that it allows one token look ahead (peek), but I have not made such a thing for the non terminals, since it would be trying to fix a problem that could be avoided (ambiguity). In the grammar, the terminals are the ones that are all-caps.

推荐答案

最简单的解决方案是由终端实际使用的解决方案。 C的设计者,并因此使用C的各种派生类:将赋值简单地视为另一个运算符,而不必将其限制在语句的顶级。因此,在C语言中,以下是没有问题的:

The simplest solution is the one actually taken by the designers of C, and thus by the various C derivatives: treat assignment simply as yet another operator, without restricting it to being at the top-level of a statement. Hence, in C, the following is unproblematic:

while ((ch = getchar()) != EOF) { ... }

并非所有人都会考虑这种好风格,但肯定是很常见的(特别是在 for 语句的子句中,其语法或多或少要求将赋值作为表达式)。

Not everyone will consider that good style, but it is certainly common (particularly in the clauses of the for statement, whose syntax more or less requires that assignment be an expression).

有两个小并发症,相对容易实现:

There are two small complications, which are relatively easy to accomplish:


  1. 与大多数运算符不同,赋值与右侧相关联,因此将 a = b = 0 解析为 a =(b = 0)而不是(a = b)= 0 (这是非常意外的)。

  1. Logically, and unlike most operators, assignment associates to the right so that a = b = 0 is parsed as a = (b = 0) and not (a = b) = 0 (which would be highly unexpected). It also binds very weakly, at least to the right.

对于它应该与左侧绑定的紧密程度也存在分歧。在C语言中,大多数情况下都遵循严格的优先级模型,因此 a = 2 + b = 3 被拒绝,因为它被解析为 a = ((2 + b)= 3) a = 2 + b = 3 看起来很糟糕,但也请考虑 a< ? (x = a):( y = a)。在C ++中,三元运算符的结果可以作为引用,您可以这样写:(a< b?x:y)= a 其中括号是甚至认为赋值的优先级要比三元运算符低。

Opinions vary as to how tightly it should bind to the left. In C, for the most part a strict precedence model is followed so that a = 2 + b = 3 is rejected since it is parsed as a = ((2 + b) = 3). a = 2 + b = 3 might seem like terrible style, but consider also a < b ? (x = a) : (y = a). In C++, where the result of the ternary operator can be a reference, you could write that as (a < b ? x : y) = a in which the parentheses are required even thought assignment has lower precedence than the ternary operator.

这些选项都不是很难在语法中实现的。

None of these options are difficult to implement in a grammar, though.

在许多语言中,赋值的左侧语法受限制。在具有参考值的C ++中,该限制可以被认为是语义上的,我相信它通常是通过语义检查来实现的,但是在许多C派生类中, lvalue 可以通过语法定义。 。这样的定义是明确的,但是它们通常不适合使用自上而下的语法进行解析,甚至对于自下而上的语法也可能造成复杂性。进行解析后检查始终是一个简单的解决方案。

In many languages, the left-hand side of an assignment has a restricted syntax. In C++, which has reference values, the restriction could be considered semantic, and I believe it is usually implemented with a semantic check, but in many C derivatives lvalue can be defined syntactically. Such definitions are unambiguous, but they are often not amenable to parsing with a top-down grammar, and they can create complications even for a bottom-up grammar. Doing the check post-parse is always a simple solution.

如果您真的想将赋值语句与表达式语句区分开,那么如果您使用递归下降等自上而下的解析技术,则确实会遇到预测失败的问题( not 歧义)。由于语法不是模棱两可的,因此一个简单的解决方案是使用LASON(1)解析器生成器(例如bison / yacc),解析此类语法没有问题,因为它不需要及早决定要使用哪种语句。解析。总体而言,使用LALR(1)甚至GLR解析器生成器都可以通过允许您以易于理解的形式指定语法,并与语法分析相对应,从而简化了解析器的实现。 (例如,LALR(1)解析器可以自然地处理左联想运算符,而LL(1)语法只能生成右联想语法,因此需要对语法树进行某种重构。)

If you really want to distinguish assignment statements from expression statements, then you indeed run into the problem of prediction failure (not ambiguity) if you use a top-down parsing technique such as recursive descent. Since the grammar is not ambiguous, a simple solution is to use an LALR(1) parser generator such as bison/yacc, which has no problems parsing such a grammar since it does not require an early decision as to which kind of statement is being parsed. On the whole, the use of LALR(1) or even GLR parser generators simplifies implementation of a parser by allowing you to specify a grammar in a form which is easily readable and corresponds to the syntactic analysis. (For example, an LALR(1) parser can handle left-associative operators naturally, while a LL(1) grammar can only produce right-associative parses and therefore requires some kind of reconstruction of the syntax tree.)

递归下降解析器是计算机程序,而不是语法,因此其表达能力不受LL(1)语法的形式约束的限制。这既有优势也有劣势:优势在于您可以找到不受LL(1)语法限制的解决方案。缺点是提取关于该语言精确语法的清晰陈述要复杂得多(有时甚至不可能)。例如,尽管有上述限制,这种能力使递归血统语法可以或多或少的自然方式处理左联想。

A recursive descent parser is a computer program, not a grammar, and its expressiveness is thus not limited by the formal constraints of LL(1) grammars. That is both a strength and a weakness: the strength is that you can find solutions which are not limited by the limitations of LL(1) grammars; the weakness is that it is much more complicated (even, sometimes, impossible) to extract a clear statement about the precise syntax of the language. This power, for example, allows recursive descent grammars to handle left associativity in a more-or-less natural way despite the restriction mentioned above.

这条路,那么解决方案就足够简单了。您将具有某种功能:

If you want to go down this road, then the solution is simple enough. You will have some sort of function:

/* This function parses and returns a single expression */
Node expr() {
  Node left = value();
  while (true) {
    switch (lookahead) {
      /* handle each possible operator token. I left out
       * the detail of handling operator precedence since it's
       * not relevant here
       */
      case OP_PLUS: {
        accept(lookahead);
        left = MakeNode(OP_PLUS, left, value());
        break;
      }
      /* If no operator found, return the current expression */
      default:
        return left;
    }
  }
}

很容易修改为能够解析表达式和语句。首先,重构函数,使其在给定第一个运算符的情况下解析表达式的其余部分。 (唯一的变化是一个新的原型,并删除了正文中的第一行。)

That easily be modified to be able to parse both expressions and statements. First, refactor the function so that it parses the "rest" of an expression, given the first operator. (The only change is a new prototype and the deletion of the first line in the body.)

/* This function parses and returns a single expression
 * after the first value has been parsed. The value must be
 * passed as an argument.
 */
Node expr_rest(Node left) {
  while (true) {
    switch (lookahead) {
      /* handle each possible operator token. I left out
       * the detail of handling operator precedence since it's
       * not relevant here
       */
      case OP_PLUS: {
        accept(lookahead);
        left = MakeNode(OP_PLUS, left, value());
        break;
      }
      /* If no operator found, return the current expression */
      default:
        return left;
    }
  }
}

有了它,就可以了同时实现 expr stmt 很简单:

With that in place, it is straightforward to implement both expr and stmt:

Node expr() {
  return expr_rest(value());
}

Node stmt() {
  /* Check lookahead for statements which start with
   * a keyword. Omitted for simplicity.
   */

  /* either first value in an expr or target of assignment */
  Node left = value();

  switch (lookahead) {
    case OP_ASSIGN: 
      accept(lookahead);
      return MakeAssignment(left, expr())
    }
    /* Handle += and other mutating assignments if desired */
    default: {
      /* Not an assignment, just an expression */
      return MakeExpressionStatement(expr_rest(left));
    }
  }
}

这篇关于如何修改解析语法以允许赋值和非赋值语句?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆