野牛/yacc-优先级设置的限制 [英] bison/yacc - limits of precedence settings

查看:94
本文介绍了野牛/yacc-优先级设置的限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我一直在尝试用野牛解析类似Haskell的语言语法.我将省略语法和一元减号的标准问题(例如,-5\x->x-5中的(-5)是什么,或者如果a-ba-(b)apply a (-b),而其本身仍可以是apply a \x->x-b,哈哈.)然后直接进入让我惊讶的事情.

So I've been trying to parse a haskell-like language grammar with bison. I'll omit the standard problems with grammars and unary minus (like, what is (-5) from -5 and \x->x-5 or if a-b is a-(b) or apply a (-b) which itself can still be apply a \x->x-b, haha.) and go straight to the thing that suprised me.

要将整个事情简化到重要的程度,请考虑以下情况:

To simplify the whole thing to the point where it matters, consider following situation:

expression: '(' expression ')'
    | expression expression            /* lambda application */
    | '\\' IDENTIFIER "->" expression  /* lambda abstraction */
    | expression '+' expression        /* some operators to play with */
    | expression '*' expression
    | IDENTIFIER | CONSTANT            /* | ..... */
    ;

我用%left和%right解决了所有带有'+'和'*'的移位/减少冲突 优先宏,但我不知何故找不到任何好的解决方案如何设置 expression expression lambda应用程序的优先级.我试过了 %precedence和%left和%prec标记,例如此处所示 % http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html#Non-Operators , 但看起来野牛完全忽略​​了对此的任何优先级设置 规则.至少我尝试过的所有组合都失败了.与此有关的文档 主题非常稀疏,整件事看起来只适合处理 经典" expr. OPER expr.案例.

I solved all shift/reduce conflicts with '+' and '*' with %left and %right precedence macros, but I somehow failed to find any good solution how to set precedence for the expression expression lambda application. I tried %precedence and %left and %prec marker as shown for example here %http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html#Non-Operators, but it looks like bison is completely ignoring any precedence setting on this rule. At least all combinations I tried failed. Documentation on exactly this topic is pretty sparse, whole thing looks like suited only for handling the "classic" expr. OPER expr. case.

问题:我是在做错什么,还是在Bison中这是不可能的?如果不, 仅仅是不受支持还是有一些理论上的理由为什么?

Question: Am I doing something wrong, or is this impossible in Bison? If not, is it just unsupported or is there some theoretical justification why not?

备注:当然,有一种简单的解决方法可以强制向左折叠并 优先级看起来像

Remark: Of course there's an easy workaround to force left-folding and precedence that would look schematically like

expression: expression_without_lambda_application
    | expression expression_without_lambda_application
    ;

expression_without_lambda_application: /* ..operators.. */
    | '(' expression ')'
    ;

...但是那不尽人意,对吧? :]

...but that's not as neat as it could be, right? :]

谢谢!

推荐答案

如果您了解LR解析的原理,最容易理解野牛优先级的原理,因为它基于LR算法的简单修改. (在这里,我只是结合了SLR,LALR和LR语法,因为基本算法是相同的.)

It's easiest to understand how bison precedence works if you understand how LR parsing works, since it's based on a simple modification of the LR algorithm. (Here, I'm just combining SLR, LALR and LR grammars, because the basic algorithm is the same.)

LR(1)机器具有两种可能的作用类别:

An LR(1) machine has two possible classes of action:

  1. Reduce 生成的右侧,该右侧恰好在超前标记之前结束(因此位于堆栈的顶部).
  2. Shift 前瞻令牌.
  1. Reduce the right-hand side of the production which ends just before the lookahead token (and consequently is at the top of the stack).
  2. Shift the lookahead token.

在LR(1)语法中,始终可以根据计算机状态和前瞻令牌来做出决定.但是某些常见的构造(尤其是中缀表达式)显然需要语法,语法看起来比其所需要的复杂得多,并且所需要的单位缩减要多于所需的数量.

In an LR(1) grammar, the decision can always be made on the basis of the machine state and the lookahead token. But certain common constructs -- notably infix expressions -- apparently require grammars which appear more complicated than they need to be, and which require more unit reductions than should be necessary.

在LR解析是一个新的时代,大多数从业者习惯了某种运算符优先级语法(有关定义,请参见下文),并且循环比现在昂贵得多,因此额外的单元减少似乎很烦人,将LR算法修改为使用标准优先级技术很有吸引力.

In an era in which LR parsing was new, and most practitioners were used to some sort of operator precedence grammar (see below for definition), and in which cycles were a lot more expensive than they are now so that the extra unit reductions seemed annoying, the modification of the LR algorithm to use standard precedence techniques was attractive.

修改-基于解析运算符优先级语法的经典算法-涉及为每个右侧(即每个产品)和每个终端分配一个优先级值(整数).然后,在构造LR机器时,如果给定的状态和超前行为可以触发平移或归约动作,则可以通过将可能的归约的优先次序与超前标记的优先次序进行比较来解决冲突.如果降低的优先级更高,则获胜;如果降低的优先级更高,则获胜.否则机器会移位.

The modification -- which is based on a classic algorithm for parsing operator precedence grammars -- involves assigning a precedence value (an integer) to every right-hand side (i.e. every production) and to every terminal. Then, when constructing the LR machine, if a given state and lookahead can trigger either a shift or a reduce action, the conflict is resolved by comparing the precedence of the possible reduction with the precedence of the lookahead token. If the reduction has a higher precedence, it wins; otherwise the machine shifts.

请注意,还原优先级永远不会相互比较,令牌优先级也不会.它们实际上可以来自不同的领域.此外,对于简单的表达语法,直观上是与运算符位于堆栈顶部"进行比较;这实际上是通过使用生产中最右边的终端来指定生产的优先级来实现的.为了处理左与右的关联性,我们实际上没有对生产使用与终端相同的优先级值.左关联产生的优先级稍高于终端的优先级,而右结合产生的优先级稍低的终端.这可以通过使终端优先级为3的倍数,而减少优先级的值大于或小于终端1来实现. (实际上,实际上比较是> 而不是,因此可以在终端使用偶数,但这是实现细节.)

Note that reduction precedences are never compared with each other, and neither are token precedences. They can actually come from different domains. Furthermore, for a simple expression grammar, intuitively the comparison is with the operator "at the top of the stack"; this is actually accomplished by using the right-most terminal in a production to assign the precedence of the production. To handle left vs. right associativity, we don't actually use the same precedence value for a production as for a terminal. Left-associative productions are given a precedence slightly higher than the terminal's precedence, and right-associative productions are given a precedence slightly lower. This could be done by making the terminal precedences multiples of 3 and the reduction precedences a value one greater or less than the terminal. (Actually in practice the comparison is > rather than so it's possible to use even numbers for terminals, but that's an implementation detail.)

事实证明,语言并不总是那么简单.因此,有时-一元运算符就是一个典型的例子-显式提供与默认值不同的归约优先级很有用. (另一种情况是,优先级与第一个终端的相关性高于最后一个终端,如果有多个,则优先级更高.)

As it turns out, languages are not always quite so simple. So sometimes -- the case of unary operators is a classic example -- it's useful to explicitly provide a reduction precedence which is different from the default. (Another case is where the precedence is more related to the first terminal than the last, in the case where there are more than one.)

社论注释: 真的,这全都是骇客.这是一个很好的技巧,它可能会很有用.但是像所有黑客一样,它可能被推得太远.需要恕我直言,恕我直言,那些需要对算法有充分理解并需要对语法进行详细分析的复杂而有技巧的技巧是不明智的.他们感到困惑.使用无上下文语法形式主义和解析器生成器的全部目的是简化语法的表示形式,并使其更容易进行验证. /编者注.

Editorial note: Really, this is all a hack. It's a good hack, and it can be useful. But like all hacks, it can be pushed too far. Intricate tricks with precedence which require a full understanding of the algorithm and a detailed analysis of the grammar are not, IMHO, elegant. They are confusing. The whole point of using a context-free-grammar formalism and a parser generator is to simplify the presentation of the grammar and make it easier to verify. /Editorial note.

运算符优先级语法是可以仅使用优先级关系(使用诸如经典"shunting-yard"算法之类的算法)进行自底向上解析的运算符语法.运算符语法是其中右侧没有两个连续的非终结符的语法.以及生产:

An operator precedence grammar is an operator grammar which can be bottom-up parsed using only precedence relations (using an algorithm such as the classic "shunting-yard" algorithm). An operator grammar is a grammar in which no right-hand side has two consecutive non-terminals. And the production:

expression: expression expression

不能用运算符语法表示.

cannot be expressed in an operator grammar.

在该生产中,班次减少冲突发生在中间,就在有操作员的情况下,操作员会在该位置之前.在那种情况下,人们可能想将产生第一个表达式的任何归约的优先级与分隔表达式的不可见运算符进行比较.

In that production, the shift reduce conflict comes in the middle, just before where the operator would be if there were an operator. In that case, one would want to compare the precedence of whichever reduction gave rise to the first expression with the invisible operator which separates the expressions.

在某些情况下(这需要仔细的语法分析,因此非常脆弱),可以区分可以启动expression的终端和可以是运算符的终端.在这种情况下,可以将expression的第一组中的端子的优先级用作优先级比较中的比较器.由于这些终端绝不会在运营商产品中用作比较器,因此不会产生任何额外的歧义.

In some circumstances (and this requires careful grammar analysis, and is consequently very fragile), it's possible to distinguish between terminals which could start an expression and terminals which could be operators. In that case, it would be possible to use the precedence of the terminals in the FIRST set of expression as the comparators in the precedence comparison. Since those terminals will never be used as the comparators in an operator production, no additional ambiguity is created.

当然,一旦终端有可能成为前缀或前缀运算符(例如一元减号),该操作就会立即失败.因此,大多数语言可能只具有理论意义.

Of course, that fails as soon as it is possible for a terminal to be either an infix or a prefix operator, such as unary minus. So it's probably only of theoretical interest in most languages.

总而言之,我个人认为,显式定义非应用程序表达式的解决方案清晰,优雅并且与LR解析理论一致,而使用优先级关系的任何尝试都将变得不那么容易理解和验证.

In summary, I personally think that the solution of explicitly defining non-application expressions is clear, elegant and consistent with the theory of LR parsing, while any attempt to use precedence relations will turn out to be far less easy to understand and verify.

但是,如果您坚持要执行的话,这是一种语法,在这种情况下, (没有一元运算符)将有效,该语法基于为令牌分配优先级值,而令牌可能以expression开头:

But, if you insist, here is a grammar which will work in this particular case (without unary operators), based on assigning precedence values to the tokens which might start an expression:

%token IDENTIFIER CONSTANT  APPLY
%left '(' ')' '\\' IDENTIFIER CONSTANT APPLY
%left '+'
%left '*'

%%
expression: '(' expression ')' 
          | expression expression %prec APPLY
          | '\\' IDENTIFIER "->" expression
          | expression '+' expression
          | expression '*' expression
          | IDENTIFIER | CONSTANT
          ;

这篇关于野牛/yacc-优先级设置的限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆