尚不清楚yacc/野牛生产规范如何导致堆栈溢出 [英] Unclear how a yacc/bison production spec can cause a stack overflow

查看:119
本文介绍了尚不清楚yacc/野牛生产规范如何导致堆栈溢出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是家庭作业,而是一本书.我得到以下语法:

%{
#include <stdio.h>
#include <ctype.h>

int yylex();
int yyerror();
%}

%%

command : exp '\n' { printf("%d\n", $1); exit(0); }
        | error '\n'
          { 
            yyerrok;
            printf("reenter expression: "); 
          }
          command
        ;

exp : exp '+' term { $$ = $1 + $3; }
    | exp '-' term { $$ = $1 - $3; }
    | term { $$ = $1; }
    ;

term : term '*' factor { $$ = $1 * $3; }
     | factor { $$ = $1; }
     ;

factor : NUMBER { $$ = $1; }
       | '(' exp ')' { $$ = $2; }
       ;

%%

int main() {
  return yyparse();
}

int yylex() {
  int c;

  /* eliminate blanks*/
  while((c = getchar()) == ' ');

  if (isdigit(c)) {
    ungetc(c, stdin);
    scanf("%d\n", &yylval);
    return (NUMBER);
  }

  /* makes the parse stop */
  if (c == '\n') return 0;

  return (c);
}

int yyerror(char * s) {
  fprintf(stderr, "%s\n", s);
  return 0;
} /* allows for printing of an error message */

这是任务:

为计算器程序建议的简单错误恢复技术的缺点在于 在许多错误之后可能会导致堆栈溢出.重写以删除 这个问题.

我真的无法弄清楚堆栈是如何发生的.既然开始的生产是唯一一个带有错误令牌的生产,那么yacc/bison会不会在重新启动之前弹出堆栈中的所有元素?

解决方案

如有疑问,最简单的方法就是使用野牛.

为了避免错误,我对程序进行了一些修改.首先,由于新程序依赖于查看'\n'令牌,因此我删除了行if (c == '\n') return 0;,这将禁止发送'\n'.其次,我将scanf("%d\n", &yylval);固定为scanf("%d", &yylval);.没有必要吞下数字后的空格,尤其是如果数字后的空格是换行符,则尤其是. (但是,scanf模式无法区分不同种类的空格,因此模式"%d\n"具有与"%d "完全相同的语义.这两种都不正确.)

然后,在构建计算器时,我在main的顶部添加了yydebug = 1;行,并为野牛提供​​了-t("trace")选项.这会导致解析器在处理输入时详细显示其进度.

它有助于获取状态表转储,以了解发生了什么情况.您可以使用-v野牛选项来做到这一点.不过,我会将其留给读者.

然后我运行程序并故意键入语法错误:

./error
Starting parse
Entering state 0
Reading a token: 2++3

跟踪工具已经输出了两行,但是在我输入了一些信息之后,跟踪就涌出了.

首先,解析器吸收NUMBER 2和运算符+ :(注意:下面的nterm是bison所说的非终端"方式,而token是终端";堆栈仅显示州号.)

Next token is token NUMBER ()
Shifting token NUMBER ()
Entering state 2
Reducing stack by rule 9 (line 25):
   $1 = token NUMBER ()
-> $$ = nterm factor ()
Stack now 0
Entering state 7
Reducing stack by rule 8 (line 22):
   $1 = nterm factor ()
-> $$ = nterm term ()
Stack now 0
Entering state 6
Reading a token: Next token is token '+' ()
Reducing stack by rule 6 (line 18):
   $1 = nterm term ()
-> $$ = nterm exp ()
Stack now 0
Entering state 5
Next token is token '+' ()
Shifting token '+' ()
Entering state 12

到目前为止,太好了.看到+之后,我们将到达状态12.这是它的定义:

State 12

    4 exp: exp '+' . term
    7 term: . term '*' factor
    8     | . factor
    9 factor: . NUMBER
   10       | . '(' exp ')'

    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

    term    go to state 17
    factor  go to state 7

(默认情况下,bison不会使非核心项目杂乱无章地出现在状态表中.我添加了-r itemset来获得完整的项目集,但手动完成闭合就足够容易了.)

由于在这种状态下,我们正在寻找+的右侧操作数,所以只有可以开始表达式的内容才有效:NUMBER和(.但这不是我们所拥有的:

Reading a token: Next token is token '+' ()
syntax error

好的,我们处于状态12,如果您查看上面的状态描述,您会发现error也不在超前集合中.所以:

Error: popping token '+' ()
Stack now 0 5

这使我们回到了状态5,这是操作员所期望的状态:

State 5

    1 command: exp . '\n'
    4 exp: exp . '+' term
    5    | exp . '-' term

    '\n'  shift, and go to state 11
    '+'   shift, and go to state 12
    '-'   shift, and go to state 13

因此该状态也不会在error上进行转换.向前.

Error: popping nterm exp ()
Stack now 0

好,回到开头.状态0 确实具有error转换:

   error   shift, and go to state 1

因此,现在我们可以转换error令牌并进入状态1,如过渡表所示:

Shifting token error ()
Entering state 1

现在,我们需要通过跳过输入令牌来同步输入,直到获得换行符为止. (请注意,在执行此操作时,野牛实际上会弹出并推入错误标记.请尽量不要让它分散您的注意力.)

Next token is token '+' ()
Error: discarding token '+' ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token NUMBER ()
Error: discarding token NUMBER ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token '\n' ()
Shifting token '\n' ()
Entering state 8

对,我们找到了换行符.状态5为command: error '\n' . $@1 command. $@1是野牛插入的代替中间规则动作(MRA)的标记(空生产)的名称.状态8将减少此标记,从而导致MRA运行,这要求我提供更多输入.请注意,此时错误恢复已完成.现在,我们处于完全正常的状态,并且堆栈反映了以下事实:我们依次有一个开始(状态0),一个error令牌(状态1)和一个换行令牌(状态8):

Reducing stack by rule 2 (line 13):
-> $$ = nterm $@1 ()
Stack now 0 1 8
Entering state 15
Reading a token: Try again: 

降低MRA之后,从状态8开始采取相应的操作,然后进入状态15(为避免混乱,我将非核心项目留在了外面):

State 15

    3 command: error '\n' $@1 . command

    error   shift, and go to state 1
    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

因此,现在我们已经准备好按预期解析一个全新的命令.但是我们还没有减少错误产生.它仍在堆栈中,因为只有减小点后的command才能减小它.而且我们还没有开始.

但是必须注意,状态15 确实error上具有转换,如从状态goto表中可以看到的那样.之所以有这种过渡,是因为闭包包含command的两个产生:

    1 command: . exp '\n'
    3        | . error '\n' $@1 command

以及exptermfactor的生成,它们也是闭包的一部分.

那么,如果我们现在输入另一个错误怎么办?堆栈将弹出到此位置(0 1 8 15),新的error令牌将被压入堆栈(0 1 8 15 1),令牌将被丢弃,直到可以换行(0 1 8 15 1 8)为止.新的MRA(如bson所称的$@1)将减少到堆栈中(0 1 8 15 1 8 15),这时我们准备开始解析另一次尝试.

希望您能看到前进的方向.

请注意,这实际上与任何其他右递归生产的效果没有区别.语法是否尝试接受许多表达式:

prog: exp '\n'
    | exp '\n' { printf("%d\n", $1); } prog

您会看到相同的堆栈堆积,这就是为什么不建议右递归的原因. (这也是因为您最终插入MRA是为了避免看到相反的结果,因为在所有输入的末尾将堆栈减少到prog.)

    command  go to state 20
    exp      go to state 5
    term     go to state 6
    factor   go to state 7

This is not homework, but it is from a book. I'm given the following grammar:

%{
#include <stdio.h>
#include <ctype.h>

int yylex();
int yyerror();
%}

%%

command : exp '\n' { printf("%d\n", $1); exit(0); }
        | error '\n'
          { 
            yyerrok;
            printf("reenter expression: "); 
          }
          command
        ;

exp : exp '+' term { $$ = $1 + $3; }
    | exp '-' term { $$ = $1 - $3; }
    | term { $$ = $1; }
    ;

term : term '*' factor { $$ = $1 * $3; }
     | factor { $$ = $1; }
     ;

factor : NUMBER { $$ = $1; }
       | '(' exp ')' { $$ = $2; }
       ;

%%

int main() {
  return yyparse();
}

int yylex() {
  int c;

  /* eliminate blanks*/
  while((c = getchar()) == ' ');

  if (isdigit(c)) {
    ungetc(c, stdin);
    scanf("%d\n", &yylval);
    return (NUMBER);
  }

  /* makes the parse stop */
  if (c == '\n') return 0;

  return (c);
}

int yyerror(char * s) {
  fprintf(stderr, "%s\n", s);
  return 0;
} /* allows for printing of an error message */

Here is the task:

The simple error recovery technique suggested for the calculator program is flawed in that it could cause stack overflow after many errors. Rewrite it to remove this problem.

I can't really figure out how a stack overflow can occur. Given the starting production is the only one that has an error token in it, wouldn't yacc/bison pop all the elements on the stack and before restarting?

解决方案

When in doubt, the simplest thing is to use bison.

I modified the program slightly in order to avoid the bugs. First, since the new program relies on seeing '\n' tokens, I removed the line if (c == '\n') return 0; which would suppress sending '\n'. Second, I fixed scanf("%d\n", &yylval); to scanf("%d", &yylval);. There's no reason to swallow the whitespace following the number, particularly if the whitespace following the number is a newline. (However, scanf patterns don't distinguish between different kinds of whitespace, so the pattern "%d\n" has exactly the same semantics as "%d ". Neither of those would be correct.)

Then I added the line yydebug = 1; at the top of main and supplied the -t ("trace") option to bison when I built the calculator. That causes the parser to show its progress in detail as it processes the input.

It helps to get a state table dump in order to see what's going on. You can do that with the -v bison option. I'll leave that for readers, though.

Then I ran the program and deliberately typed an syntax error:

./error
Starting parse
Entering state 0
Reading a token: 2++3

The trace facility has already output two lines, but after I give it some input, the trace comes pouring out.

First, the parser absorbs the NUMBER 2 and the operator +: (Note: nterm below is bison's way of saying "non-terminal", while token is a "terminal"; the stack shows only state numbers.)

Next token is token NUMBER ()
Shifting token NUMBER ()
Entering state 2
Reducing stack by rule 9 (line 25):
   $1 = token NUMBER ()
-> $$ = nterm factor ()
Stack now 0
Entering state 7
Reducing stack by rule 8 (line 22):
   $1 = nterm factor ()
-> $$ = nterm term ()
Stack now 0
Entering state 6
Reading a token: Next token is token '+' ()
Reducing stack by rule 6 (line 18):
   $1 = nterm term ()
-> $$ = nterm exp ()
Stack now 0
Entering state 5
Next token is token '+' ()
Shifting token '+' ()
Entering state 12

So far, so good. State 12 is where we get to after we've seen +; here is its definition:

State 12

    4 exp: exp '+' . term
    7 term: . term '*' factor
    8     | . factor
    9 factor: . NUMBER
   10       | . '(' exp ')'

    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

    term    go to state 17
    factor  go to state 7

(By default, bison doesn't clutter up the state table with non-core items. I added -r itemset to get the full itemset, but it would have been easy enough to do the closure by hand.)

Since in this state we're looking for the right-hand operand of +, only things which can start an expression are valid: NUMBER and (. But that's not what we've got:

Reading a token: Next token is token '+' ()
syntax error

OK, we're in State 12, and if you look at the above state description, you'll see that error is not in the lookahead set either. So:

Error: popping token '+' ()
Stack now 0 5

That puts us back in State 5, which is where an operator was expected:

State 5

    1 command: exp . '\n'
    4 exp: exp . '+' term
    5    | exp . '-' term

    '\n'  shift, and go to state 11
    '+'   shift, and go to state 12
    '-'   shift, and go to state 13

So that state doesn't have a transition on error either. Onwards.

Error: popping nterm exp ()
Stack now 0

OK, back to the beginning. State 0 does have an error transition:

   error   shift, and go to state 1

So now we can shift the error token and enter state 1, as indicated by the transition table:

Shifting token error ()
Entering state 1

Now we need to synchronize the input by skipping input tokens until we get to a newline token. (Note that bison actually pops and pushes the error token while it's doing this. Try not to let that distract you.)

Next token is token '+' ()
Error: discarding token '+' ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token NUMBER ()
Error: discarding token NUMBER ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token '\n' ()
Shifting token '\n' ()
Entering state 8

Right, we found the newline. State 5 is command: error '\n' . $@1 command. $@1 is the name of the marker (empty production) which bison inserted in place of the mid-rule action (MRA). State 8 will reduce this marker, causing the MRA to run, which asks me for more input. Note that at this point error recovery is complete. We are now in a perfectly normal state, and the stack reflects the fact that we have, in order, the start (state 0), an error token (state 1) and a newline token (state 8):

Reducing stack by rule 2 (line 13):
-> $$ = nterm $@1 ()
Stack now 0 1 8
Entering state 15
Reading a token: Try again: 

After the MRA is reduced, the corresponding action from State 8 is taken and we proceed to State 15 (to avoid clutter, I left out the non-core items):

State 15

    3 command: error '\n' $@1 . command

    error   shift, and go to state 1
    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

So now we're ready to parse a brand new command, as expected. But we have not yet reduced the error production; it's still on the stack because it can't be reduced until the command following the dot has been reduced. And we haven't even started on it yet.

But it's important to note that State 15 does have a transition on error, as you can see from the state's goto table. It has that transition because the closure includes the two productions for command:

    1 command: . exp '\n'
    3        | . error '\n' $@1 command

as well as the productions for exp, term and factor, which are also part of the closure.

So what happens if we now enter another error? The stack will be popped back to this point (0 1 8 15), a new error token will be pushed onto the stack (0 1 8 15 1), tokens will be discarded until a newline can be shifted (0 1 8 15 1 8) and a new MRA ($@1, as bison calls it) will be reduced onto the stack (0 1 8 15 1 8 15) at which point we're ready to start parsing yet another attempt.

Hopefully you can see where this is going.

Note that it really is not different from the effect of any other right-recursive production. Had the grammar attempted to accept a number of expressions:

prog: exp '\n'
    | exp '\n' { printf("%d\n", $1); } prog

you would see the same stack build-up, which is why right-recursion is discouraged. (And also because you end up inserting MRAs to avoid seeing the results in reverse order as the stack is reduced down to prog at the end of all input.)

    command  go to state 20
    exp      go to state 5
    term     go to state 6
    factor   go to state 7

这篇关于尚不清楚yacc/野牛生产规范如何导致堆栈溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆