看似等效的门希尔(Menhir)规则改变了语法中的移位/减少冲突 [英] Seemingly equivalent Menhir rules change the shift/reduce conflicts found in grammar

查看:112
本文介绍了看似等效的门希尔(Menhir)规则改变了语法中的移位/减少冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Menhir创建一个解析器,并且有一种行为总是使我绊倒,但我不理解.我创建了以下最小示例来演示它;这显示了Go语言的方法声明中的接收方参数声明( http://golang.org/ref/spec#Method_declarations ):

I'm using Menhir to create a parser, and there's a behaviour that always trips me, and I don't understand it. I have created the following minimal example to demonstrate it; this shows the declaration of the receiver argument in a method declaration in the Go language (http://golang.org/ref/spec#Method_declarations):

%{

%}

%token <string> T_identifier
%token T_star

%start <unit> demo


%%


(* This rule has a shift/reduce conflict
demo:
| option(T_identifier) option(T_star) T_identifier { () } 
*)

(* This rule is okay. *)
demo:
| T_identifier T_star T_identifier { () }
| T_identifier T_identifier        { () }
| T_star T_identifier              { () }
| T_identifier                     { () }

据我所知,这两个规则在语义上是等效的:我们正在寻找一个可选的标识符(接收者的名称),一个可选的星号(是否为指针)和一个强制类型名称(接收者的类型) ).但是,第一个规则(已注释掉的规则)产生了移位/减少冲突,而第二个规则运行良好.

As far as I can see, both rules are semantically equivalent: we are looking for an optional identifier (the name of the receiver), an optional star (pointer or not) and a mandatory type name (the type of the receiver). However, the first rule (the one commented out) gives a shift/reduce conflict while the second rule works fine.

我已经能够在解析器中通过将option替换为多条规则来进行处理,但这一直困扰着我,我不知道它为什么会发生.

I've been able to progress in my parser by replacing option with multiple rules whenever this has happened, but it's been nagging me that I don't understand why it's happening.

(如果您不知道menhir,它是LR(1)解析器生成器,因此可能会应用其他类似工具的工作知识.)

(If you don't know menhir, it is an LR(1) parser generator, so knowledge of how other similar tools work would probably apply.)

推荐答案

我认为Menhir通过一些标准转换将EBNF转换为BNF.这很普遍.不幸的是,这些转换会破坏LR(1)的可分析性.

I suppose that Menhir reduces EBNF to BNF through some standard transformations. That's pretty common. Unfortunately these transformations can undermine LR(1) parsability.

使用另一种类似于EBNF的语法来考虑您的规则:

Consider your rule, in another EBNF-like syntax:

demo → IDENTIFIER? STAR? IDENTIFIER

将其转换为BNF的一种方法将是您在第二组规则中所做的:定义四个不同的规则,每种可能性一个.这种转换永远不会改变LR(1)的可解析性,并且对于带有可选"运算符的规则来说始终是可能的,但是它有两个缺点:

One way to translate that into BNF would be as you do in your second set of rules: define four different rules, one for each possibility. That transformation will never alter LR(1) parsability, and it is always possible for rules with an "optional" operator, but it has two disadvantages:

  1. 如果规则中有几个可选元素,则最终结果是产品的很多.

它不适用于重复运算符.

It doesn't work for repetition operators.

似乎更通用的另一种方法是为每个扩展的BNF运算符创建一个新的非终结符.因此,我们可以这样做:

Another way which seems more general is to create a new non-terminal for each extended BNF operator. So we could do this:

optional_identifier → IDENTIFIER | ε
optional_star       → STAR | ε
demo                → optional_identifier optional_star IDENTIFIER

类似的转换对x*也适用:

repeated_x → ε | repeated_x x

这肯定会产生等效的语言,但是现在语法可能不是LR(1).

That certainly produces an equivalent language, but now the grammar might not be LR(1).

尤其是,demo不再是LR(1).它在一开始就失败了.假设第一个输入令牌是IDENTIFIER.那可能是

In particular, demo is no longer LR(1). It fails right at the beginning. Say the first input token is IDENTIFIER. That could be the start of

IDENTIFIER IDENTIFIER

IDENTIFIER

(或其他一些可能性,但这足以显示问题.)

(or some other possibilities, but that's enough to show the problem.)

在第二种情况(只是一种类型)中,我们需要先减小optional_identifieroptional_star,然后才能移动IDENTIFIER.在第一种情况(变量和类型)中,我们需要立即移动IDENTIFIER.而我们唯一可以区别的信息是前瞻令牌IDENTIFIER,显然这还不够.

In the second case (just a type), we need to reduce an optional_identifier and an optional_star before we can shift the IDENTIFIER. In the first case (a variable and a type), we need to shift the IDENTIFIER immediately. And the only information we have available to tell the difference is the lookahead token, IDENTIFIER, which clearly isn't enough.

如果我们使用四向扩展生产,则没有问题:

If we use the four-way expanded production, then there is no problem:

demo → IDENTIFIER
     | STAR IDENTIFIER
     | IDENTIFIER IDENTIFIER
     | IDENTIFIER STAR IDENTIFIER

在这里,当我们看到IDENTIFIER时,我们不知道它是第一生产,第三生产还是第四生产的一部分.但是没关系,因为在所有情况下,我们只需要移动IDENTIFIER并等待更多信息即可.

Here, when we see an IDENTIFIER, we don't know whether it's part of the first production, the third production or the fourth production. But it doesn't matter because in all cases, we just shift the IDENTIFIER and wait for more information.

yacc/bison和其他允许中间规则动作(MRA)的解析器生成器也会发生类似的现象. MRA变成了一个新的非终端,其唯一产生的是ε.生产;新的非终端的目的是在MRA减少时运行它.这真的很酷,只是有时在我们不知道减少它是否合适的时候引入了新的非终结符.因此,即使语言没有更改,MRA仍可以将完美的LR(1)语法转换为非LR(1)语法.

A similar phenomenon occurs with yacc/bison and other parser generators which allow mid-rule actions (MRAs). An MRA is turned into a new non-terminal whose only production is an ε production; the purpose of the new non-terminal is to run the MRA when it is reduced. That's really cool, except that sometimes the new non-terminal is introduced at a point where we can't know whether it is appropriate to reduce it or not. So the MRA can turn a perfectly good LR(1) grammar into a non-LR(1) grammar, even though the language has not changed.

尽管与Menhir无关,但上面的EBNF转换(如果仔细完成)不会引起歧义,这可能是有趣的.因此,即使生成的语法不再是LR(1),它仍然是明确的,可以使用GLR解析器进行解析.但是,据我所知,由于Menhir不会生成GLR解析器,因此这一事实可能不是很有用.

Although not relevant in the case of Menhir, it's possibly interesting that the EBNF transformation above, if done carefully, does not introduce ambiguity which was not otherwise present. So even if the resulting grammar is no longer LR(1), it is still unambiguous and could be parsed with a GLR parser. However, since Menhir does not, as far as I know, generate GLR parsers, that fact might not be very useful.

这篇关于看似等效的门希尔(Menhir)规则改变了语法中的移位/减少冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆