从正则表达式创建NFA的步骤 [英] Steps to creating an NFA from a regular expression

查看:325
本文介绍了从正则表达式创建NFA的步骤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从正则表达式创建NFA时遇到描述每个步骤"的问题.问题如下:

将以下正则表达式转换为不确定的有限状态自动机(NFA),清楚地描述了所使用算法的步骤: (b | a)* b(a | b)

我已经制作了一个简单的三态机,但是这很直觉. 这是我的讲师在过去的一次考试中提出的问题,他的讲师也对汤普森的算法作了以下解释:

也许我已经在某个地方掩盖了一种算法,但到目前为止,我只是用有根据的猜测来创建它们.

适用于一般方法的简短版本.
那里有一种算法叫做Thompson-McNaughton-Yamada构造算法,有时也简称为"Thompson构造".一个人构建中间的NFA,一路填充,同时注意运算符的优先级:首先是括号,然后是Kleene Star(例如a *),然后是串联(例如ab),然后是交替(例如a | b).

这是构建(b | a)* b(a | b)NFA的深入演练

构建顶层

  1. 句柄括号.注意:在实际的实现中,通过对括号的内容进行递归调用来处理括号是有意义的.为了清楚起见,我将推迟对parens内任何内容的评估.

  2. Kleene Stars:那里只有一个*,因此我们构建了一个占位符Kleene Star机器,称为P(稍后将包含b | a). 中间结果:

  3. 串联:将P附加到b,然后将b附加到一个名为Q的占位符机器(它将包含(a | b).中间结果:

  4. 括号内没有替代项,因此我们将其跳过.

现在我们坐在P * bQ机器上. (请注意,我们的占位符P和Q只是级联机器.)通过上述步骤的递归应用,我们将p边替换为b | a的NFA,并将Q边替换为a | b的NFA.


建筑物P

  1. 跳过.没有括号.

  2. 跳过.没有Kleene明星.

  3. 跳过.没有污染.

  4. 为b | a构建备用机器.中间结果:


集成P

接下来,我们返回到该P * bQ机器,然后撕掉P边缘.我们将P边缘的用作P机的起始状态,并将P边缘的目的地用作P机的目标状态.我们还使该状态被拒绝(取消其作为接受状态的属性).结果看起来像这样:


建筑物Q

  1. 跳过.没有括号.

  2. 跳过.没有Kleene明星.

  3. 跳过.没有污染.

  4. 为a | b构建备用机器.顺便说一下,交替是可交换的,因此a | b在逻辑上等同于b | a. (阅读:出于懒惰而跳过了这个较小的脚注图.)


积分Q

我们用上面的P做我们的事情,除了用构造的中间b | a机器代替Q边.结果是:

多田!嗯,我是说QED.


想了解更多吗?

以上所有图像均使用在线工具自动生成,以将正则表达式转换为非确定性有限自动机.您可以找到Thompson-McNaughton-Yamada的源代码在线构建算法.

Aho的编译器:原理,技术和工具,尽管其解释很少涉及实现细节.您也可以从学习汤普森构造在C 中的出色表现.拉斯·考克斯(Russ Cox)在关于正则表达式匹配的热门文章中对此进行了详细描述./p>

I'm having issues 'describing each step' when creating an NFA from a regular expression. The question is as follows:

Convert the following regular expression to a non-deterministic finite-state automaton (NFA), clearly describing the steps of the algorithm that you use: (b|a)*b(a|b)

I've made a simple 3-state machine but it's very much from intuition. This is a question from a past exam written by my lecturer, who also wrote the following explanation of Thompson's algorithm: http://www.cs.may.ie/staff/jpower/Courses/Previous/parsing/node5.html

Can anyone clear up how to 'describe each step clearly'? It just seems like a set of basic rules rather than an algorithm with steps to follow.

Maybe there's an algorithm I've glossed over somewhere but so far I've just created them with an educated guess.

解决方案

Short version for general approach.
There's an algo out there called the Thompson-McNaughton-Yamada Construction Algorithm or sometimes just "Thompson Construction." One builds intermediate NFAs, filling in the pieces along the way, while respecting operator precedence: first parentheses, then Kleene Star (e.g., a*), then concatenation (e.g., ab), followed by alternation (e.g., a|b).

Here's an in-depth walkthrough for building (b|a)*b(a|b)'s NFA

Building the top level

  1. Handle parentheses. Note: In actual implementation, it can make sense to handling parentheses via a recursive call on their contents. For the sake of clarity, I'll defer evaluation of anything inside of parens.

  2. Kleene Stars: only one * there, so we build a placeholder Kleene Star machine called P (which will later contain b|a). Intermediate result:

  3. Concatenation: Attach P to b, and attach b to a placeholder machine called Q (which will contain (a|b). Intermediate result:

  4. There's no alternation outside of parentheses, so we skip it.

Now we're sitting on a P*bQ machine. (Note that our placeholders P and Q are just concatenation machines.) We replace the P edge with the NFA for b|a, and replace the Q edge with the NFA for a|b via recursive application of the above steps.


Building P

  1. Skip. No parens.

  2. Skip. No Kleene stars.

  3. Skip. No contatenation.

  4. Build the alternation machine for b|a. Intermediate result:


Integrating P

Next, we go back to that P*bQ machine and we tear out the P edge. We have the source of the P edge serve as the starting state for the P machine, and the destination of the P edge serve as the destination state for the P machine. We also make that state reject (take away its property of being an accept state). The result looks like this:


Building Q

  1. Skip. No parens.

  2. Skip. No Kleene stars.

  3. Skip. No contatenation.

  4. Build the alternation machine for a|b. Incidentally, alternation is commutative, so a|b is logically equivalent to b|a. (Read: skipping this minor footnote diagram out of laziness.)


Integrating Q

We do what we did with P above, except replacing the Q edge with the intermedtae b|a machine we constructed. This is the result:

Tada! Er, I mean, QED.


Want to know more?

All the images above were generated using an online tool for automatically converting regular expressions to non-deterministic finite automata. You can find its source code for the Thompson-McNaughton-Yamada Construction algorithm online.

The algorithm is also addressed in Aho's Compilers: Principles, Techniques, and Tools, though its explanation is sparse on implementation details. You can also learn from an implementation of the Thompson Construction in C by the excellent Russ Cox, who described it some detail in a popular article about regular expression matching.

这篇关于从正则表达式创建NFA的步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆