野牛减少/减少与转换和表达括号的冲突 [英] Bison Reduce/Reduce Conflict with Casting and Expression Parentheses

查看:194
本文介绍了野牛减少/减少与转换和表达括号的冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用野牛建立一个语法,并且将我最后一次减少/减少错误的范围缩小到以下测试用例:

I'm building a grammar in bison, and I've narrowed down my last reduce/reduce error to the following test-case:

%{
#include <stdio.h>
#include <string.h>

extern yydebug;

void yyerror(const char *str)
{
  fprintf(stderr, "Error: %s\n", str);
}

main()
{
  yydebug = 1;
  yyparse();
}
%}

%right '='
%precedence CAST
%left '('

%token AUTO BOOL BYTE DOUBLE FLOAT INT LONG SHORT SIGNED STRING UNSIGNED VOID

%token IDENTIFIER

%start file

%debug

%%

file
  : %empty
  | statement file
  ;

statement
  : expression ';'
  ;

expression
  : expression '=' expression
  | '(' type ')' expression %prec CAST
  | '(' expression ')'
  | IDENTIFIER
  ;

type
  : VOID
  | AUTO
  | BOOL
  | BYTE
  | SHORT
  | INT
  | LONG
  | FLOAT
  | DOUBLE
  | SIGNED
  | UNSIGNED
  | STRING
  | IDENTIFIER
  ;

大概的问题是,当在表达式中看到(IDENTIFIER)时,它无法分辨类型和表达式之间的区别.

Presumably the issue is that it can't tell the difference between type and expression when it sees (IDENTIFIER) in an expression.

输出:

fail.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
fail.y:64.5-14: warning: rule useless in parser due to conflicts [-Wother]
   | IDENTIFIER
     ^^^^^^^^^^

该如何解决此冲突?

推荐答案

如果语法仅限于OP中显示的形式,则消除冲突相对容易,因为语法是明确的.唯一的问题是它是LR(2)而不是LR(1).

If the grammar were limited to the productions shown in the OP, it would be relatively easy to remove the conflict, since the grammar is unambiguous. The only problem is that it is LR(2) and not LR(1).

OP中的分析是完全正确的.解析器看到时,例如:

The analysis in the OP is completely correct. When the parser sees, for example:

( identifier1 · )

(其中·标记当前点,因此先行标记为)),无法知道它是否是

(where · marks the current point, so the lookahead token is )), it is not possible to know whether that is a prefix of

( identifier1 · ) ;
( identifier1 · ) = ...
( identifier1 · ) identifier2
( identifier1 · ) ( ...

在前两种情况下,必须将identifier1减小为expression,以便随后可以将( expression )减小为expression,而在后两种情况下, identifier1必须减小为type,以便随后可以将( type ) expression减小为expression.如果解析器可以在将来看到一个令牌,则可以做出决定.

In the first two cases, identifier1 must be reduced to expression, so that ( expression ) can subsequently be reduced to expression, whereas in the last two cases, identifier1 must be reduced to type so that ( type ) expression can subsequently be reduced to expression. If the parser could see one token further into the future, the decision could be made.

由于对于任何LR(k)语法而言,都有一个LR(1)语法可以识别相同的语言,因此显然存在一种解决方案.通常的方法是将减少推迟到单令牌前瞻足以区分之前.一种方法是:

Since for any LR(k) grammar, there is an LR(1) grammar which recognizes the same language, there is clearly a solution; the general approach is to defer the reduction until the one-token lookahead is sufficient to distinguish. One way to do this is:

cast_or_expr   : '(' IDENTIFIER ')'
               ;
cast           : cast_or_expr
               | '(' type ')'
               ;
expr_except_id : cast_or_expr
               | cast expression %prec CAST
               | '(' expr_except_id ')'
               | expression '=' expression
               ;
expression     : IDENTIFIER
               | expr_except_id
               ;

(除了从type的生产中删除IDENTIFIER之外,其余语法都是相同的.)

(The rest of the grammar is the same except for the removal of IDENTIFIER from the productions for type.)

对于没有符号可以同时作为前缀和中缀运算符(例如-)并且不能省略运算符(有效地,如在函数调用中)的语法来说,这很好用.特别是,它不适用于C,因为它将留下歧义:

That works fine for grammars where no symbol can be both a prefix and an infix operator (like -) and where no operator can be elided (effectively, as in function calls). In particular, it won't work for C, because it will leave the ambiguities:

( t ) * a   // product or cast?
( t ) ( 3 ) // function-call or cast?

这些是语法中的真正歧义,只能通过知道t是类型名还是变量/函数名来解决.

Those are real ambiguities in the grammar which can only be resolved by knowing whether t is a typename or a variable/function name.

C解析器的常用"解决方案是通过在扫描器和解析器之间共享符号表来解决歧义;由于typedef类型别名声明必须在适用范围内首次将符号用作类型名称之前出现,因此可以在扫描令牌之前知道是否已使用typedef声明了令牌.更准确地说,如果未看到typedef,则可以假定该符号不是类型,尽管它可能是完全未声明的.

The "usual" solution for C parsers is to resolve the ambiguity by sharing the symbol table between the scanner and the parser; since typedef type alias declarations must appear before the first use of the symbol as a typename in the applicable scope, it can be known in advance of the scan of the token whether or not the token has been declared with a typedef. More accurately, if the typedef has not been seen, it can be assumed that the symbol is not a type, although it may be completely undeclared.

通过使用GLR语法和语义谓词,可以将逻辑限制为解析器.有人觉得更优雅.

By using a GLR grammar and a semantic predicate, it is possible to restrict the logic to the parser. Some people find that more elegant.

这篇关于野牛减少/减少与转换和表达括号的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆