yacc shift-reduce模糊lambda语法 [英] yacc shift-reduce for ambiguous lambda syntax

查看:475
本文介绍了yacc shift-reduce模糊lambda语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Yacc(与Go一起打包的那个)编写一个玩具语言的语法,由于下面的伪问题,我有一个预期的转换 - 减少冲突。

 开始:
stmt_list

表达式:
INT | IDENT | lambda | '('expr')'{$$ = $ 2}

lambda:
'('params')''{'stmt_list'}'

params :
expr | params','expr

stmt:
/ * empty * / | expr

stmt_list:
stmt | stmt_list';'stmt

一个lambda函数看起来像这样:

  map((v){v * 2},collection)

我的解析器发出:


冲突:1次移位/减少

blockquote>

给定输入:

 (a)
code>

通过'正确解析 expr ('expr')'规则。然而,如果输入:

 (a){a} 

(这将是身份函数的lambda,返回其输入)。我得到:
$ b


语法错误:意外的'{'


这是因为当读取(a)时,解析器选择将其作为'('expr')',而不是认为它是'('params')'。鉴于这种冲突是一种转变 - 减少而不是减少 - 减少,我认为这是可以解决的。我只是不知道如何构造支持这种语法的语法。



编辑|这很丑陋,但我正在考虑定义一个令牌,以便词法分析器可以识别')'{'序列并将其作为单个令牌发送以解决此问题。



<编辑2 |实际上,更好的做法是,我会让lambda在语法中需要像 - >(a,b){a * b} 这样的语法,但是让词法分析器发出 - > ,而不是它在实际的源代码中。

解决方案

你的分析确实是正确的。虽然语法不是不明确的,但是解析器不可能决定输入简化为(< expr> 并且向前看在移动之前,是否应将 expr 缩小为 params 或是否应该将转换为 lambda 的一部分。如果下一个标记是可见的,那么可以做出决定,所以语法LR(2)不在go / yacc的权限范围内。 b
$ b

如果你是使用野牛,你可以轻松地通过请求一个GLR解析器来解决这个问题,但我不认为go / yacc提供了这个功能。



有一个LR(1 )语言的语法(对于 k 的任何值,总是存在对应于任何LR(k)语法的LR(1)语法),但是写起来相当烦人手。 LR(k)到LR(1)转换的基本思想是通过将k-1个上下文标记累加到每个生产中,从而将还原决策k-1标记向前移动。所以在 k 是2的情况下,每个生产P: N&rar; α 会被替换为 T N U Ť&阿尔法;对于 FIRST(α)中的每个 T 以及每个 > U FOLLOW(N)中。 [见注1]这会导致非终端语言在非平凡语法中的相当大的爆炸。

与其追求这个想法,让我提出两点更简单的解决方案,两者似乎都非常接近。

首先,在您提供的语法中,问题实际上仅仅是需要双令牌lookahead当这两个标记是 {时。这很容易在词法分析器中检测到,并且导致一个仍然很棘手但解决方法更简单的解决方案:返回){作为单个标记。您需要处理干预空白等,但它不需要在词法分析器中保留任何上下文。这有额外的好处,你不需要定义 params 作为 expr s列表;他们可以是 IDENT 的列表(如果相关的话,评论表明它不是)。



我认为这个选择更简洁一点,就是扩展您似乎已经提出的解决方案:接受一点点,并在语义行为中排除错误。在这种情况下,您可能会这样做:

 开始:
stmt_list

expr :
INT
| IDENT
| lambda
| '('expr_list')'
{//如果$ 2有多个expr,报告错误
$$ = $ 2
}

lambda:
'('expr_list')''{'stmt_list'}'
{//如果expr_list中的任何内容不是有效的参数,报告错误
$$ = make_lambda($ 2,$ 4)
}

expr_list:
expr | expr_list','expr

stmt:
/ *空* / | expr

stmt_list:
stmt | stmt_list';'stmt



笔记




  1. 这只是一个提纲;完整的算法包括恢复原始分析树的机制。如果 k 大于2,那么 T U 是字符串 FIRST k-1 FOLLOW k-1 集。


I'm writing a grammar for a toy language in Yacc (the one packaged with Go) and I have an expected shift-reduce conflict due to the following pseudo-issue. I have to distilled the problem grammar down to the following.

start:
  stmt_list

expr:
  INT | IDENT | lambda | '(' expr ')' { $$ = $2 }

lambda:
  '(' params ')' '{' stmt_list '}'

params:
  expr | params ',' expr

stmt:
  /* empty */ | expr

stmt_list:
  stmt | stmt_list ';' stmt

A lambda function looks something like this:

map((v) { v * 2 }, collection)

My parser emits:

conflicts: 1 shift/reduce

Given the input:

(a)

It correctly parses an expr by the '(' expr ')' rule. However given an input of:

(a) { a }

(Which would be a lambda for the identity function, returning its input). I get:

syntax error: unexpected '{'

This is because when (a) is read, the parser is choosing to reduce it as '(' expr ')', rather than consider it to be '(' params ')'. Given this conflict is a shift-reduce and not a reduce-reduce, I'm assuming this is solvable. I just don't know how to structure the grammar to support this syntax.

EDIT | It's ugly, but I'm considering defining a token so that the lexer can recognize the ')' '{' sequence and send it through as a single token to resolve this.

EDIT 2 | Actually, better still, I'll make lambdas require syntax like ->(a, b) { a * b} in the grammar, but have the lexer emit the -> rather than it being in the actual source code.

解决方案

Your analysis is indeed correct; although the grammar is not ambiguous, it is impossible for the parser to decide with the input reduced to ( <expr> and with lookahead ) whether or not the expr should be reduced to params before shifting the ) or whether the ) should be shifted as part of a lambda. If the next token were visible, the decision could be made, so the grammar LR(2), which is outside of the competence of go/yacc.

If you were using bison, you could easily solve this problem by requesting a GLR parser, but I don't believe that go/yacc provides that feature.

There is an LR(1) grammar for the language (there is always an LR(1) grammar corresponding to any LR(k) grammar for any value of k) but it is rather annoying to write by hand. The essential idea of the LR(k) to LR(1) transformation is to shift the reduction decisions k-1 tokens forward by accumulating k-1 tokens of context into each production. So in the case that k is 2, each production P: N → α will be replaced with productions TNUTα U for each T in FIRST(α) and each U in FOLLOW(N). [See Note 1] That leads to a considerable blow-up of non-terminals in any non-trivial grammar.

Rather than pursuing that idea, let me propose two much simpler solutions, both of which you seem to be quite close to.

First, in the grammar you present, the issue really is simply the need for a two-token lookahead when the two tokens are ){. That could easily be detected in the lexer, and leads to a solution which is still hacky but a simpler hack: Return ){ as a single token. You need to deal with intervening whitespace, etc., but it doesn't require retaining any context in the lexer. This has the added bonus that you don't need to define params as a list of exprs; they can just be a list of IDENT (if that's relevant; a comment suggests that it isn't).

The alternative, which I think is a bit cleaner, is to extend the solution you already seem to be proposing: accept a little too much and reject the errors in a semantic action. In this case, you might do something like:

start:
  stmt_list

expr:
    INT
  | IDENT
  | lambda
  | '(' expr_list ')'
        { // If $2 has more than one expr, report error
          $$ = $2
        }

lambda:
  '(' expr_list ')' '{' stmt_list '}'
        { // If anything in expr_list is not a valid param, report error
          $$ = make_lambda($2, $4)
        }

expr_list:
  expr | expr_list ',' expr

stmt:
  /* empty */ | expr

stmt_list:
  stmt | stmt_list ';' stmt

Notes

  1. That's only an outline; the complete algorithm includes the mechanism to recover the original parse tree. If k is greater than 2 then T and U are strings the the FIRSTk-1 and FOLLOWk-1 sets.

这篇关于yacc shift-reduce模糊lambda语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆