LL(1)解析器中FIRST和FOLLOW集的目的? [英] Purpose of FIRST and FOLLOW sets in LL(1) parsers?

查看:504
本文介绍了LL(1)解析器中FIRST和FOLLOW集的目的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释LL(1)语法中应如何使用FIRST和FOLLOW吗?我知道它们用于语法表构建,但是我不知道如何.

解决方案

在LL(1)解析器中,解析器的工作原理是维护一个工作区,该工作区最初是种子到开始符号的,后面是字符串结束标记(通常表示为$).在每个步骤中,它都会执行以下操作之一:

  • 如果工作区的第一个符号是终端,则将其与下一个输入标记进行匹配(如果不匹配,则报告错误).
  • 如果工作区的第一个符号是非终结符,则它预测用什么生产来替换该非终结符.

预测步骤是显示FIRST和FOLLOW的位置.解析器需要能够完全基于当前的非终结符和输入的下一个令牌来猜测要使用的生产.问题是如何做到这一点.

让我们假设当前的非终结符为A,输入的下一个标记为t.如果您知道A的产生,那么您会选择应用哪一个?有一个简单的情况需要考虑:是否存在形式为A→的产品. t&omega ;,其中ω是任意字符串,那么您应该选择该产品,因为您要作为输入查找的t将与产品前面的t相匹配.

还有一些复杂的情况需要考虑.假设您具有形式A→的产品. B&omega ;,其中B是非终结子,ω是一些字符串.您想在什么情况下猜测这种生产?好吧,如果您知道下一个终端符号在,则除非您知道B可以扩展为以终端t开头的字符串,否则您就不会猜测它的产生(还有另一种情况,我们将在第二).这是FIRST集合出现的地方.生产,则某些非终结符X的集合FIRST(X)是所有可能出现在从X派生的某些字符串的开头的所有终结符的集合.欧米茄并且您看到非终结t,您可能会猜想在t∈时精确地使用该乘积.第一(B);也就是说,B可以派生一些以t开头的字符串.如果B没有衍生出任何以t开头的东西,那么就没有理由选择它,并且如果B确实衍生出了以t开头的东西,则您需要做出选择,以便最终将t与它进行匹配.

ε时,事情变得有些棘手.产品介绍.现在,假设您具有形式为A→的产品. BC&omega ;,其中B和C是非终结符,而ω是一个字符串.我们还假设输入的下一个标记是t.如果t∈ FIRST(B),那么我们将如上所述选择该生产.但是,如果t∉会发生什么呢?第一(B)?如果有ε在语法中,如果B可以推导ε,我们可能仍要选择该产生式.和t∈第一(C).为什么是这样?如果发生这种情况,则意味着我们可以通过生成BCω然后生成ε来匹配t.从B,离开Cω与之匹配的t.在这种情况下,我们可能不得不浏览"一个非终结符.幸运的是,这是由FIRST集处理的.如果非末端X可以产生&epsilon ;,则ε &在; FIRST(X).因此,我们可以使用FIRST集来查看是否需要通过ε来浏览"一个非终结符. &在; FIRST(X).

到目前为止,我们还没有谈论FOLLOW集.他们从哪儿来的?好吧,假设我们正在处理非终端A,我们看到了终端t,但是A的所有生产实际上都不能消耗t.那我们该怎么办?事实证明,还有一种方法可以吃掉那个t.请记住,LL(1)解析器通过维护其中包含字符串的工作区来工作.我们正在看的t可能根本不应该与当前的非末端A相匹配,相反,我们应该使A产生ε.然后让工作空间中稍后的非终结符与t匹配.这就是FOLLOW集合的来源.非终结符X的FOLLOW集合(表示为FOLLOW(X))是在某些派生中可以紧随X之后出现的所有终结符的集合.当选择一个选择哪个生产,我们添加在最终规则 - 如果终端符号t是FOLLOW集合A中,我们选择了一些生产,最终将产生与小量;.这样,A将消失",并且我们可以将t与A非终结符之后出现的某个字符进行匹配.

这不是LL(1)解析的完整介绍,但是我希望它可以帮助您了解为什么我们需要FIRST和FOLLOW集.有关更多信息,请阅读有关解析的书(我建议由Grune和Jacobs撰写的解析技术:实用指南)或上一本有关编译器的课程.作为一个完全无耻的插件,我在2012-2013年夏季和所有演讲幻灯片都可以在线获得.

希望这会有所帮助!

Can anyone explain to me how FIRST and FOLLOW should be used in LL(1) grammar? I understand that they are used for syntax table construction, but I don't understand how.

解决方案

In an LL(1) parser, the parser works by maintaining a workspace initially seeded to the start symbol followed by the end-of-string marker (usually denoted $). At each step, it does one of the following:

  • If the first symbol of the workspace is a terminal, it matches it against the next token of input (or reports an error if it doesn't match.)
  • If the first symbol of the workspace is a nonterminal, it predicts what production to replace that nonterminal with.

The predict step is where FIRST and FOLLOW show up. The parser needs to be able to guess, based purely on the current nonterminal and the next token of input, which production to use. The question is how to do this.

Let's suppose that the current nonterminal is A and the next token of input is t. If you know the productions of A, which one would you choose to apply? There's one simple case to consider: if there's a production of the form A → tω, where ω is some arbitrary string, then you should pick that production because the t you're looking at as input will match the t at the front of the production.

There are also some complex cases to consider. Suppose you have a production of the form A → Bω, where B is a nonterminal and ω is some string. Under what circumstances would you want to guess this production? Well, if you know that the next terminal symbol is a t, you wouldn't want to guess this production unless you knew that B can expand to a string that starts with the terminal t (there's another case that we'll talk about in a second). This is where FIRST sets come in. In grammars without ε productions, the set FIRST(X) for some nonterminal X is the set of all terminals that can potentially appear at the start of some string derived from X. If you have a production of the form A → Bω and you see the nonterminal t, you'd guess to use that production precisely when t ∈ FIRST(B); that is, B can derive some string that starts with t. If B doesn't derive anything starting with t, then there's no reason to choose it, and if B does derive something starting with t, you'd want to make this choice so that you could eventually match the t against it.

Things get a bit trickier when ε productions are introduced. Now, let's suppose that you have a production of the form A → BCω, where B and C are nonterminals and ω is a string. Let's also suppose the next token of input is t. If t ∈ FIRST(B), then we'd choose this production, as mentioned above. However, what happens if t ∉ FIRST(B)? If there are ε productions in the grammar, we might still want to choose this production if B can derive ε and t ∈ FIRST(C). Why is this? If this happens, it means that we might be able to match the t by producing BCω, then producing ε from B, leaving Cω against which to match the t. This is one context where we might have to "look through" a nonterminal. Fortunately, this is handled by FIRST sets. If a nonterminal X can produce ε, then ε ∈ FIRST(X). Therefore, we can use FIRST sets to check whether we need to "look through" a nonterminal by seeing whether ε ∈ FIRST(X).

So far we haven't talked about FOLLOW sets. Where do they come in? Well, suppose that we're processing the nonterminal A, we see the terminal t, but none of the productions for A can actually consume the t. What do we do then? It turns out there's still a way that we can eat up that t. Remember that LL(1) parsers work by maintaining a workspace with a string in it. It's possible that the t we're looking at is not supposed to be matched against the current nonterminal A at all, and instead we're supposed to have A produce ε and then let some later nonterminal in the workspace match against the t. This is where FOLLOW sets come in. The FOLLOW set of a nonterminal X, denoted FOLLOW(X), is the set of all terminal symbols that can appear immediately after X in some derivation. When choosing which production to choose for A, we add in a final rule - if the terminal symbol t is in the FOLLOW set of A, we choose some production that ultimately will produce ε. That way, the A will "disappear" and we can match the t against some character that appears after the A nonterminal.

This isn't a complete introduction to LL(1) parsing, but I hope it helps you see why we need FIRST and FOLLOW sets. For more information, pick up a book on parsing (I recommend Parsing Techniques: A Practical Guide by Grune and Jacobs) or take a course on compilers. As a totally shameless plug, I taught a compilers course in Summer 2012-2013 and all of the lecture slides are available online.

Hope this helps!

这篇关于LL(1)解析器中FIRST和FOLLOW集的目的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆