将正则表达式转换为有限状态机 [英] Converting regular expression to finite state machine

查看:66
本文介绍了将正则表达式转换为有限状态机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您是否有关于将任何正则表达式转换为有限状态机的算法的提示.例如,解析正则表达式并将状态适当地添加到 fsm 的算法?任何参考或更深入的想法?

would you have a hint at algorithm to convert any regular expression to a finite state machine. For instance, an algorithm parsing a regexp and adding states to the fsm appropriately? Any reference or deeper idea?

我是用 Python 写的

I am writting this with Python

感谢和问候

推荐答案

使用 Michael Sipser 的 计算理论导论.第 1 章给出了将正则表达式转换为确定性或非确定性有限状态自动机(DFA 或 NFA)的详细算法,以证明它们的等价性(DFA、NFA 和正则表达式可以完全匹配相同的类)字符串).

Use Michael Sipser's Introduction to the Theory of Computation. Chapter 1 gives detailed algorithms for converting a regular expression to a deterministic or non-deterministic finite-state automaton (DFA or NFA), in the context of proving their equivalence (a DFA, an NFA and a regular expression can match exactly the same classes of strings).

总体思路是:将正则表达式转换为 NFA,这可以非常简单地完成(* 是一个循环,| 和字符范围是分支点).然后,您将 NFA 转换为(更大的)DFA,这涉及为每个 set 替代 NFA 状态创建一个 DFA 状态.DFA 最多具有与 NFA 状态集的幂集一样多的状态(例如,具有 3 个状态的 NFA 可以转换为最多具有 2^3 = 8 个状态的 DFA),并且可以识别任何目标字符串而无需回溯.阅读本书了解详情.

The general idea is: You convert a regular expression to an NFA, which can be done quite straightforwardly (* is a loop, | and character ranges are branch points). You then convert the NFA into a (much larger) DFA, which involves creating one DFA state for each set of alternative NFA states. The DFA has at most as many states as the powerset of the set of NFA states (e.g., an NFA with 3 states can get transformed to a DFA with at most 2^3 = 8 states), and can recognize any target string without backtracking. Read the book for the details.

这篇关于将正则表达式转换为有限状态机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆