LALR 解析器和前瞻 [英] LALR parsers and look-ahead

查看:63
本文介绍了LALR 解析器和前瞻的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无缘无故地实现了 LALR 解析表的自动构建.此解析器有两种风格,LALR(0) 和 LALR(1),其中数字表示前瞻量.

我对前瞻的含义感到困惑.

如果我的输入流是 'abc' 并且我有以下产品,我需要 0 次前瞻还是 1 次?

<前>P :== a E

同样的问题,但我无法通过仅查看输入中的a"来提前选择正确的 P 产生式.

<前>P :== a b E|一个

我有额外的困惑,因为我认为在构建 LALR 解析器生成器时不会真正发生后面的 P 生成.原因是当我们计算闭包时,语法会自动有效地进行左因子分解.

我正在浏览此页面,直到我到达第一个/后续部分.我的问题是我不知道为什么我们要计算这些东西,所以我无法在脑海中抽象出来.

我几乎明白前瞻与移动输入无关,而是决定何时减少.

我一直在读《龙》这本书,但它与塔伦蒂诺的剧本一样线性.对于已经知道如何执行此操作的人来说,这似乎是一个很好的参考.

解决方案

在学习自底向上解析(例如 LALR)时,您需要做的第一件事是记住它与自顶向下解析完全不同.自上而下的解析从一个非终结符开始,即产生式的左侧 (LHS),并猜测要使用哪个右侧 (RHS).另一方面,自下而上的解析从识别 RHS 开始,然后找出要选择的 LHS.

更具体地说,自底向上的解析器将传入的令牌累积到队列中,直到右侧位于队列的右侧.然后它通过用相应的 LHS 替换它来减少 RHS,并检查是否合适的 RHS 位于修改后的累加输入的右侧边缘.它一直这样做,直到它决定在输入中的那个点不再发生减少,然后读取一个新的标记(或者,换句话说,获取下一个输入标记并移动它到队列的末尾.)

这一直持续到读取最后一个标记并执行所有可能的归约,此时如果剩下的是作为开始符号"的单个非终结符,则它接受解析.

解析器不必仅仅因为 RHS 出现在当前队列的末尾就减少它,但它不能减少不在队列末尾的 RHS.这意味着它必须在转移任何其他令牌之前决定是否减少.由于决定并不总是显而易见的,它可能会检查一个或多个尚未读取的标记(先行标记",因为它正在查看输入)以做出决定.但它只能查看下一个 k 标记的某些 k 值,通常为 1.

这是一个非常简单的例子;逗号分隔列表:

<代码>1.开始 ->列表2. 列表 ->元素3. 列表 ->列表 ',' 元素

假设输入是:

元素,元素,元素

一开始,输入队列是空的,因为没有 RHS 是空的,唯一的选择就是移位:

队列剩余输入动作-------------- --------------------------- -----元素,元素,元素移位

下一步,解析器决定减少使用产生式 2:

元素,元素,元素减少2

现在队列末尾有一个 List,所以解析器可以减少使用产生式 1,但它决定不基于它看到一个 的事实,kbd> 在传入的输入中.这种情况持续了一段时间:

List , ELEMENT , ELEMENT SHIFT列表,元素,元素移位列表,元素,元素减少 3列表,元素移位列表,元素移位列表,元素 - 减少 3

现在前瞻标记是输入结束"伪标记.这一次,它确实决定减少:

列表 -- REDUCE 1开始——接受

并且解析成功.

这还剩下几个问题.首先,我们如何使用 FIRST 和 FOLLOW 集?

作为一个简单的答案,如果不知道可能跟随该非终结符的非终结符的 FIRST 集,则无法计算非终结符的 FOLLOW 集.我们可以决定是否应该执行归约的一种方法是查看前瞻是否在归约的目标非终结符的 FOLLOW 集中;如果不是,则肯定不应该进行减少.该算法对于上面的简单语法就足够了,例如:Start ->;使用 , 的前瞻不可能列出,因为 , 不在 FOLLOW(Start) 中.唯一可以通过这种方式解决冲突的语法是 SLR 语法(其中 S 代表简单",它当然是).

对于大多数语法来说,这还不够,还需要进行更多的分析.一个符号可能在非终结符的 FOLLOW 集合中,但不在导致当前堆栈配置的上下文中.为了确定这一点,我们需要更多地了解我们如何获得当前配置;各种可能的分析导致 LALRIELR 和规范的 LR 解析,以及其他可能性.

I'm implementing the automatic construction of an LALR parse table for no reason at all. There are two flavors of this parser, LALR(0) and LALR(1), where the number signifies the amount of look-ahead.

I have gotten myself confused on what look-ahead means.

If my input stream is 'abc' and I have the following production, would I need 0 look-ahead, or 1?

    P :== a E

Same question, but I can't choose the correct P production in advance by only looking at the 'a' in the input.

    P :== a b E
      |   a b F

I have additional confusion in that I don't think the latter P-productions really happen in when building a LALR parser generator. The reason is that the grammar is effectively left-factored automatically as we compute the closures.

I was working through this page and was ok until I got to the first/follow section. My issue here is that I don't know why we are calculating these things, so I am having trouble abstracting this in my head.

I almost get the idea that the look-ahead is not related to shifting input, but instead in deciding when to reduce.

I've been reading the Dragon book, but it is about as linear as a Tarantino script. It seems like a great reference for people who already know how to do this.

解决方案

The first thing you need to do when learning about bottom-up parsing (such as LALR) is to remember that it is completely different from top-down parsing. Top-down parsing starts with a nonterminal, the left-hand-side (LHS) of a production, and guesses which right-hand-side (RHS) to use. Bottom-up parsing, on the other hand, starts by identifying the RHS and then figures out which LHS to select.

To be more specific, a bottom-up parser accumulates incoming tokens into a queue until a right-hand side is at the right-hand end of the queue. Then it reduces that RHS by replacing it with the corresponding LHS, and checks to see whether an appropriate RHS is at the right-hand edge of the modified accumulated input. It keeps on doing that until it decides that no more reductions will take place at that point in the input, and then reads a new token (or, in other words, takes the next input token and shifts it onto the end of the queue.)

This continues until the last token is read and all possible reductions are performed, at which point if what remains is the single non-terminal which is the "start symbol", it accepts the parse.

It is not obligatory for the parser to reduce a RHS just because it appears at the end of the current queue, but it cannot reduce a RHS which is not at the end of the queue. That means that it has to decide whether to reduce or not before it shifts any other token. Since the decision is not always obvious, it may examine one or more tokens which it has not yet read ("lookahead tokens", because it is looking ahead into the input) in order to decide. But it can only look at the next k tokens for some value of k, typically 1.

Here's a very simple example; a comma separated list:

1. Start -> List
2. List  -> ELEMENT
3. List  -> List ',' ELEMENT

Let's suppose the input is:

ELEMENT , ELEMENT , ELEMENT

At the beginning, the input queue is empty, and since no RHS is empty the only alternative is to shift:

queue                   remaining input                 action
----------------------  ---------------------------     -----
                        ELEMENT , ELEMENT , ELEMENT     SHIFT

At the next step, the parser decides to reduce using production 2:

ELEMENT                 , ELEMENT , ELEMENT             REDUCE 2

Now there is a List at the end of the queue, so the parser could reduce using production 1, but it decides not to based on the fact that it sees a , in the incoming input. This goes on for a while:

List                    , ELEMENT , ELEMENT             SHIFT
List ,                  ELEMENT , ELEMENT               SHIFT
List , ELEMENT          , ELEMENT                       REDUCE 3
List                    , ELEMENT                       SHIFT
List ,                  ELEMENT                         SHIFT
List , ELEMENT          --                              REDUCE 3

Now the lookahead token is the "end of input" pseudo-token. This time, it does decide to reduce:

List                    --                              REDUCE 1
Start                   --                              ACCEPT

and the parse is successful.

That still leaves a few questions. To start with, how do we use the FIRST and FOLLOW sets?

As a simple answer, the FOLLOW set of a non-terminal cannot be computed without knowing the FIRST sets for the non-terminals which might follow that non-terminal. And one way we can decide whether or not a reduction should be performed is to see whether the lookahead is in the FOLLOW set for the target non-terminal of the reduction; if not, the reduction certainly should not be performed. That algorithm is sufficient for the simple grammar above, for example: the reduction of Start -> List is not possible with a lookahead of ,, because , is not in FOLLOW(Start). Grammars whose only conflicts can be resolved in this way are SLR grammars (where S stands for "Simple", which it certainly is).

For most grammars, that is not sufficient, and more analysis has to be performed. It is possible that a symbol might be in the FOLLOW set of a non-terminal, but not in the context which lead to the current stack configuration. In order to determine that, we need to know more about how we got to the current configuration; the various possible analyses lead to LALR, IELR and canonical LR parsing, amongst other possibilities.

这篇关于LALR 解析器和前瞻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆