递归下降：后缀转换重复操作符的中缀 [英] Recursive descent: infix to postfix conversion repeating operators

查看：107 发布时间：2020/6/3 21:11:17 algorithm parsing bnf infix-notation recursive-descent

本文介绍了递归下降：后缀转换重复操作符的中缀的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们最近在uni的编程课程中学习了有关使用堆栈将中缀转换为后缀的知识。而且我有一段时间写解析器的意思，所以我决定使用递归下降法。我正在关注以下内容：通过递归下降Theodore Norvell 解析表达式。
这是他使用的语法：

We recently learned about converting infix to postfix using stacks during our programming course at uni. And I have meaning to write my parser for a while, so I decided to it using recursive descent. I am following along with this: Parsing Expressions by Recursive Descent Theodore Norvell . Here is the grammar he uses:

E --> P {B P}
P --> v | "(" E ")" | U P
B --> "+" | "-" | "*" | "/" | "^"
U --> "-" | "+"

我试图在C语言中实现它，并且可以正常工作。但是，如果我给它以下输入，并且运算符像这样一个接一个地跟随：

I have attempted at implementing this in C and it works. However If I give it the following input with operators trailing after one another like this:

---+-1-(-(2-1)+3)*-2

它输出以下内容：

---+-1.00 -2.00 1.00  - 3.00  +  - -2.00 *

以下内容似乎是错误的：

it appears to be wrong for the following:

--2.00 * 应该为 + -2 *-（基于我的堆栈实现）

- -2.00 * should be + -2 * - (based on my stack implementation)

我得到的另一个特殊结果是 2+（2 ^ 4 *（7 + 2 ^ 6）） $ b $我得到的b：

Another peculiar result I am getting is with2+(2^4*(7+2^6)) to which I got:

2.00 2.00 4.00 ^ 7.00 2.00  + 6.00 ^* +

我期望得到的时间：

 2.00 2.00 4.00 ^ 7.00 2.00 6.00 ^ + * +

我不确定，但是我是否需要优先考虑攀爬解析器-还建议。
然而，主要的问题是我如何简化尾随的一对运算符 --- +？
任何帮助将不胜感激。非常感谢。还是所有这方面的初学者。

I am not sure but do I perhaps need a precedence climbing parser- also suggested in the linked article . However the main question is how would i simplify a trailing pair of operations```---+``? Any help will be really appreciated. Thanks a lot in advance. still a beginner in all this.

这是代码：

#include <stdio.h>
#include <stdlib.h>

void expr();
void term();
void match(int t);
void error();
void parseNumber();
//E --> P {B P}
//P --> v | "(" E ")" | U P
//B --> "+" | "-" | "*" | "/" | "^"
//U --> "-" | "+"
//
// Erecognizer is
//    E()
//    expect( end )
// 
// E is
//     P
//     while next is a binary operator
//        consume
//        P

char lookahead;

int main() {
  lookahead = getchar();
  expr();
return 0;
}
// E is
//     P
//     while next is a binary operator
//        consume
//        P
void expr() {
  term();
 /* optimized by inlining rest() and removing recursive calls */
  while (1)  {
    if (lookahead == '+') {
      match('+');
      term();
      printf(" + ");
    } else if (lookahead == '-') {
      match('-');
      term();
      printf(" - ");
    }else if (lookahead == '*') {
      match('*');
      term();
      putchar('*');
    } else if (lookahead == '/') {
      match('/');
      term();
      putchar('/');
    } else if (lookahead == '^') {
      match('^');
      term();
      putchar('^');
    }  
    else break;
  }
}

// P is
//     if next is a v
//          consume
//     else if next = "("
//          consume
//          E
//          expect( ")" )
//     else if next is a unary operator
//          consume
//          P
//     else
//          error

void term() {
  if (isdigit(lookahead)) {
      parseNumber();
     // printf("lookahead at %c",lookahead);
  } else if(lookahead =='('){
      match('(');
      expr();
      match(')');
  }
  else if (lookahead =='-' ||lookahead =='+') {
      char sign = lookahead;
      match(lookahead);
      (sign=='+'?putchar('+'):putchar('-'));
      term();
  }else {
      error();
      }
}

void match(int t) {
  if (lookahead == t)
    lookahead = getchar();
  else error();

}
void parseNumber() {
  double number = 0;
  // TODO consume spaces
  if (lookahead == '\0'|| lookahead=='\n') return;
  while (lookahead >= '0' && lookahead <= '9') {
    number = number * 10 + lookahead - '0';
    match(lookahead);
  }
  if (lookahead == '.') {
    match(lookahead);
    double weight = 1;
    while (lookahead >= '0' && lookahead <= '9') {
      weight /= 10;
      number = number + (lookahead - '0') * weight;
      match(lookahead);
    }
  }
  printf("%.2f ", number);
  //printf("\ncurrent look ahead at after exiting parseNumber %c\n",lookahead);
}
void error() {
  printf("Syntax error at lookahead %c\n",lookahead);
  exit(1);
}

推荐答案

您引用得很清楚的文章指出提供的递归下降算法不是解析器：（加了强调）

The article you cite pretty clearly states that the recursive descent algorithm presented is not a parser: (emphasis added)

让我们看看基于此语法的递归下降识别器。我将此算法称为识别器而不是解析器，因为它所做的只是识别输入是否为语法语言。 它不会产生抽象的语法树或代表输入内容的其他任何形式的输出。

那是绝对正确的；语法仅适用于识别器。没有提及的是，如果您尝试修改算法以产生某种形式的输出（除了简单的是或否，表示表达式是否使用目标语言之外），您将在结构上错误的答案。

That's absolutely correct; the grammar is only suitable for use in a recognizer. What is not mentioned is that if you attempt to modify the algorithm to produce some form of output (other than a simple "yes" or "no", indicating whether or not the expression is in the target language), you will get a structurally incorrect answer.

那是因为并不是真的：

将G转换为等效的非左递归语法G1…

We can transform G to an equivalent non-left-recursive grammar G1…

或者至少，您需要请注意等效的含义。新语法是等效的，因为它可以识别相同的语言。但是它不能以相同的方式解析表达式，此外，左递归消除算法从语法中消除了生成正确解析所必需的信息。（在这种情况下，为简化起见，已经从语法中消除了必要的信息-每个运算符的优先级和关联性。为了简化起见，但是即使语法是准确的，左递归消除也将被删除。

Or at least, you need to be very careful about what is meant by "equivalent". The new grammar is equivalent in that it recognises the same language. But it does not parse expressions in the same way, and furthermore the left-recursion elimination algorithm eliminates information from the grammar which was necessary to produce a correct parse. (In this case, the necessary information -- the precedence and associativity of each operator -- had already been eliminated from the grammar, presumably for simplification. But even if the grammar had been accurate to start with, left-recursion elimination would have deleted the distinction between left-associative and right-associative operators.)

在本演示文稿的稍后部分中，Norvell在经典解决方案标题下进行了介绍。一个递归下降解析器，可以正确解析表达式。 [注1]可能就是您要编写的代码。

Somewhat later in this presentation, under the heading The classic solution, Norvell describes a recursive descent parser which does correctly parse expressions. [Note 1] That's probably the one you wanted to code.

顺便说一句，您的输出不是反向波兰符号（并且也不用括号括起来），因为您输出一元运算符 before 。 RPN始终将运算符放在其操作数之后，这使得它在没有括号的情况下是明确的，并且要求每个操作数都明确指定其所需的操作数。这通常意味着以不同的方式写一元和二元取反，以便可以区分它们，尽管另一种选择是只输出一个额外的0操作数，然后让RPN评估程序将它们视为二进制运算符。

By the way, your output is not Reverse Polish Notation (and nor is it unambiguous without parentheses) because you output unary operators before their operands. RPN always puts operators after their operands -- which is what makes it unambiguous without parentheses -- and requires that every operand unambiguously specify the number of operands it requires. That usually means writing unary and binary negation differently, so that it is possible to tell them apart, although another option would be to just output an extra 0 operand and let the RPN evaluator treat them as binary operators.

但是RPN并不是解析器真正有用的输出。解析器的常见输出是 Abstract Syntax Tree ，它是一种图形结构，描述了解析文本的语法结构。另一个常见的输出是所谓的三地址代码，它是具有无限（或至少非常大）寄存器数量的虚拟机的虚拟机代码。（并非所有VM操作码都具有三个地址，但包括许多二进制算术运算符，它们都有两个地址，这些操作符分别命名了两个源寄存器和一个目标寄存器。）当然，对于计算器，您可以随即进行求值

But RPN is not really a very useful output from a parser. A common output from a parser is an Abstract Syntax Tree, which is a graph structure which describes the syntactic structure of the parsed text. Another common output is so-called "Three Address Code", which is virtual machine code for an imaginary machine with an infinite (or at least very large) number of registers. (Not all the VM opcodes have three addresses, but many of them do, including all the binary arithmetic operators, which name two source registers and a destination register.) And, of course, for a calculator you can just evaluate as you go instead of producing any structured representation.

也许最好说，如果Norvell选择了一个不太特异的优先顺序，那么语法G2将正确地解析表达式。我们通常将一元求反运算符放在乘法和乘幂运算之间，而不放在加法和乘法运算之间。只要您仅实现乘法和精确除法，Norvell的优先级选择就无关紧要，但是如果您实现地板除法或取模（即 // 的Python语义）和％），您会发现一元否定的低优先级会导致意外评估。错误的可能是因为否定分布在乘法和精确除法上。但是（-3）// 2 与-（3 // 2）不同，并且期望 -3 // 2 的结果是第一个，而Norvell的优先顺序产生第二个。

Perhaps it would be better to say that the grammar G2 would correctly parse expressions if Norvell had chosen a less idiosyncratic precedence order. We normally put the unary negation operator between multiplication and exponentiation, not between addition and multiplication. As long as you only implement multiplication and exact division, Norvell's precedence choice doesn't matter, but if you implement floor division or modulo (that is, the Python semantics for // and %), you'll find that the low precedence of unary negation results in unexpected evaluations. The mistake is possible because negation distributes over multiplication and exact division. But (-3) // 2 is not the same as -(3 // 2), and the expected result of -3 // 2 is the first one, while Norvell's precedence ordering produces the second one.

I应该加上C中的整数除法是截断除法，而不是地板除法，并且C的％运算符是余数，而不是取模，因此C的问题不明显。另一方面，C缺少幂运算符，因此您可以采用更简单的解决方案，即给一元求反运算比任何二元运算符更高的优先级，这实际上是C所做的。

I should add that integer division in C is truncating division, not floor division, and C's % operator is remainder, not modulo, so the problem is not evident with C. On the other hand, C lacks an exponentiation operator so you could go with the simpler solution of giving unary negation higher precedence than any binary operator, which is what C in fact does.

这篇关于递归下降：后缀转换重复操作符的中缀的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

递归下降：后缀转换重复操作符的中缀 [英] Recursive descent: infix to postfix conversion repeating operators

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

递归下降：后缀转换重复操作符的中缀 [英] Recursive descent: infix to postfix conversion repeating operators

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭