前瞻集的精确定义是什么? [英] What is the precise definition of a lookahead set?

查看:139
本文介绍了前瞻集的精确定义是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写编译器和学习语法分析背后的理论。我发现,即使它是理解识别算法的一个关键概念,在网络上的信息相当差。看起来StackOverflow在一个独特的位置来解决这个问题。

解决方案

语法的前瞻性集合定义每个非终端的前瞻设置,这又依赖于每个制作的前瞻设置。确定前瞻集可以帮助我们确定语法是否为LL(1),如果是,那么我们需要什么信息为其构造递归下降解析器。



定义: :LOOKAHEAD(X - >α)和 LOOKAHEAD(X)

  LOOKAHEAD(X→α)= FIRST(α)U FOLLOW(X),如果NULLABLE(α)
LOOKAHEAD ),如果不是NULLABLE(α)
LOOKAHEAD(X)= LOOKAHEAD(X→α)U LOOKAHEAD(X→β)U LOOKAHEAD(X→γ)

其中 FIRST(α)是α可以开始的终端集合,是可以在语法中的任何位置之后的 X 的终端的集合,并且 NULLABLE(α)是α是否可以导出空序列端子(表示为ε)。以下定义取自Torben Mogensen的免费书,编译器设计基础 : : / p>

  NULLABLE(ε)= true 
NULLABLE(x)= false,如果x是终端
NULLABLE (αβ)= NULLABLE(α)和NULLABLE(β)
NULLABLE(P)= NULLABLE(α_1)或NULLABLE(α_2)或...或NULLABLE所有其产生式的右侧和右侧
是α_1,α_2,...,α_n。

定义: FIRST

  FIRST(ε)=Ø
FIRST(x)= {x},假设x是终端
FIRST(αβ)= FIRST(α)U FIRST(β),如果NULLABLE(α)
= FIRST(α),如果不为NULLABLE(α)
FIRST(P)= FIRST )U FIRST(α_2)U ... U FIRST(α_n),
如果P是非终端并且其所有乘积的右侧
是α_1,α_2,... ,α_n。

定义: FOLLOW


如果且仅当从开始的派生时,终端符号a在 FOLLOW符号S的语法,使得S⇒αXaβ,其中α和β是(可能为空的)语法符号序列。


strong>直观:



/ em>出现在语法中。可以(直接或通过任何级别的递归)跟踪的所有终端都在 FOLLOW(X)中。此外,如果在生产结束时出现 X (例如 A - > foo X ),到ε(例如 A - > foo XB B - >ε), 后面还可以跟随( FOLLOW(A)⊆FOLLOW(X))。 p>

查看Torben的书中确定 FOLLOW(X)的方法以及下面的演示。 p>

示例:

  E  - > ; n A 
A - > E B
A - > ε
B - > + A
B - > * A

首先, NULLABLE FIRST 并确定:

  NULLABLE(E)= NULLABLE(n A)= NULLABLE 
NULLABLE(A)= NULLABLE(EB)∨NULLABLE(ε)= true
NULLABLE(B)= NULLABLE(+ A)∨NULLABLE(* A)= false

FIRST(E)= FIRST(n)= {n}
FIRST(A)= FIRST(EB)U FIRST(ε)= FIRST(E)UØ= {n}(因为E不是NULLABLE)
FIRST(B)= FIRST(+ A)U FIRST(* A)= FIRST(+)U FIRST(*)= {+,*}

在确定 FOLLOW 之前,生产 E'添加E $ ,其中 $ 被视为非终端的文件结束。然后确定 :

  FOLLOW(E):让β= $, FIRST($)= {$}⊆FOLLOW(E)
让β= B,因此添加约束FIRST(B)= {+,*}⊆FOLLOW(E)
FOLLOW ):令β=ε,因此添加FIRST(ε)=Ø⊆FOLLOW(A)的约束。
因为NULLABLE(ε),添加约束FOLLOW(E)⊆FOLLOW(A)。
令β=ε,因此添加FIRST(ε)=Ø⊆FOLLOW(A)的约束。
因为NULLABLE(ε),添加约束FOLLOW(B)⊆FOLLOW(A)。
令β=ε,因此添加FIRST(ε)=Ø⊆FOLLOW(A)的约束。
因为NULLABLE(ε),添加约束FOLLOW(B)⊆FOLLOW(A)。
FOLLOW(B):令β=ε,因此添加FIRST(ε)=Ø⊆FOLLOW(B)的约束。
因为NULLABLE(ε),添加约束FOLLOW(A)⊆FOLLOW(B)。

解决这些约束(也可以通过定点迭代实现),

  {+,*,$}⊆FOLLOW(E)
FOLLOW(E)⊆FOLLOW(A)
FOLLOW )= FOLLOW(B)

FOLLOW(E)= FOLLOW(A)= FOLLOW(B)= {+,*,$}。

现在可以确定每个产品的 LOOKAHEAD :

  LOOKAHEAD(E→n A)= FIRST(n A)= {n}因为¬NULLABLE(n A)
LOOKAHEAD (E)
LOOKAHEAD(A→ε)= FIRST(E)= {n}因为¬NULLABLE(EB)
= FIRST FIRST(ε)U FOLLOW(A)因为NULLABLE(ε)
=ØU {+,*,$} = {+,*,$}
LOOKAHEAD(B-→A) FIRST(+ A),因为¬NULLABLE(+ A)
= FIRST(+)= {+}因为¬NULLABLE(+)
LOOKAHEAD(B-→A)= {*}相同的原因

最后,可以确定每个非终端的 LOOKAHEAD

  LOOKAHEAD(E)= LOOKAHEAD(E  - > n A)= {n} 
LOOKAHEAD = LOOKAHEAD(A→EB)U LOOKAHEAD(A→ε)= {n} U {+,*,$}
LOOKAHEAD(B)= LOOKAHEAD(B→+ A)U LOOKAHEAD (B - > * A)= {+,*}

该语法不是LL(1),因为其非终端具有重叠的前瞻集。 (也就是说,我们不能创建一个程序一次读取一个符号,并明确决定使用哪个生产)。


I'm toying around with writing compilers and learning about the theory behind syntax analysis. I've found that even though it's a key concept for understanding recognition algorithms, information about it on the net is fairly poor. It seems StackOverflow is in a unique position to fix this problem.

解决方案

The lookahead sets for a grammar is defined in terms of the lookahead sets for each of its non-terminals, which in turn rely on the lookahead set for each production. Determining lookahead sets can help us determine if a grammar is LL(1), and if it is, what information we need to construct a recursive-descent parser for it.

Definition: LOOKAHEAD(X -> α) and LOOKAHEAD(X):

LOOKAHEAD(X -> α) = FIRST(α) U FOLLOW(X), if NULLABLE(α)
LOOKAHEAD(X -> α) = FIRST(α), if not NULLABLE(α)
LOOKAHEAD(X) = LOOKAHEAD(X -> α) U LOOKAHEAD(X -> β) U LOOKAHEAD(X -> γ)

where FIRST(α) is the set of terminals that α can begin with, FOLLOW(X) is the set of terminals that can come after an X anywhere in the grammar, and NULLABLE(α) is whether α can derive an empty sequence of terminals (denoted ε). The following definitions are taken from Torben Mogensen's free book, Basics of Compiler Design. See below for an example.

Definition: NULLABLE(X):

NULLABLE(ε) = true
NULLABLE(x) = false, if x is a terminal
NULLABLE(αβ) = NULLABLE(α) and NULLABLE(β)
NULLABLE(P) = NULLABLE(α_1) or NULLABLE(α_2) or ... or NULLABLE(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FIRST(X):

FIRST(ε) = Ø
FIRST(x) = {x}, assuming x is a terminal
FIRST(αβ) = FIRST(α) U FIRST(β), if NULLABLE(α)
          = FIRST(α), if not NULLABLE(α)
FIRST(P) = FIRST(α_1) U FIRST(α_2) U ... U FIRST(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FOLLOW(X):

A terminal symbol a is in FOLLOW(X) if and only if there is a derivation from the start symbol S of the grammar such that S ⇒ αX aβ, where α and β are (possibly empty) sequences of grammar symbols.

Intuition: FOLLOW(X):

Look at where X occurs in the grammar. All terminals that could follow it (directly or by any level of recursion) is in FOLLOW(X). Additionally, if X occurs at the end of a production (e.g. A -> foo X), or is followed by something else that can reduce to ε (e.g. A -> foo X B and B -> ε), then whatever A can be followed by, X can also be followed by (i.e. FOLLOW(A) ⊆ FOLLOW(X)).

See the method for determining FOLLOW(X) in Torben's book and a demonstration of it just below.

An example:

E -> n A
A -> E B
A -> ε
B -> + A
B -> * A

First, NULLABLE and FIRST and are determined:

NULLABLE(E) = NULLABLE(n A) = NULLABLE(n) ∧ NULLABLE(A) = false
NULLABLE(A) = NULLABLE(E B) ∨ NULLABLE(ε) = true
NULLABLE(B) = NULLABLE(+ A) ∨ NULLABLE(* A) = false

FIRST(E) = FIRST(n A) = {n}
FIRST(A) = FIRST(E B) U FIRST(ε) = FIRST(E) U Ø = {n} (because E is not NULLABLE)
FIRST(B) = FIRST(+ A) U FIRST(* A) = FIRST(+) U FIRST(*) = {+, *}

Before FOLLOW is determined, the production E' -> E $ is added, where $ is considered the "end-of-file" non-terminal. Then FOLLOW is determined:

FOLLOW(E): Let β = $, so add the constraint that FIRST($) = {$} ⊆ FOLLOW(E)
           Let β = B, so add the constraint that FIRST(B) = {+, *} ⊆ FOLLOW(E)
FOLLOW(A): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(E) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
FOLLOW(B): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(B).
           Because NULLABLE(ε), add the constraint that FOLLOW(A) ⊆ FOLLOW(B).

Resolving these constraints (could also be achieved by fixed-point iteration),

    {+, *, $} ⊆ FOLLOW(E)
    FOLLOW(E) ⊆ FOLLOW(A)
    FOLLOW(A) = FOLLOW(B)

    FOLLOW(E) = FOLLOW(A) = FOLLOW(B) = {+, *, $}.

Now LOOKAHEAD for each production can be determined:

LOOKAHEAD(E -> n A) = FIRST(n A) = {n}     because ¬NULLABLE(n A)
LOOKAHEAD(A -> E B) = FIRST(E B)           because ¬NULLABLE(E B)
                    = FIRST(E) = {n}       because ¬NULLABLE(E)
LOOKAHEAD(A -> ε)   = FIRST(ε) U FOLLOW(A) because NULLABLE(ε)
                    = Ø U {+, *, $} = {+, *, $}
LOOKAHEAD(B -> + A) = FIRST(+ A)           because ¬NULLABLE(+ A)
                    = FIRST(+) = {+}       because ¬NULLABLE(+)
LOOKAHEAD(B -> * A) = {*}                  for the same reason

Finally, LOOKAHEAD for each non-terminal can be determined:

LOOKAHEAD(E) = LOOKAHEAD(E -> n A) = {n}
LOOKAHEAD(A) = LOOKAHEAD(A -> E B) U LOOKAHEAD(A -> ε)   = {n} U {+, *, $}
LOOKAHEAD(B) = LOOKAHEAD(B -> + A) U LOOKAHEAD(B -> * A) = {+, *}

By this knowledge we can determine that this grammar is not LL(1) because its non-terminals have overlapping lookahead sets. (I.e., we cannot create a program that reads one symbol at a time and decides unambiguously which production to use.)

这篇关于前瞻集的精确定义是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆