正则表达式转换为状态机的简短示例? [英] Short example of regular expression converted to a state machine?

查看:46
本文介绍了正则表达式转换为状态机的简短示例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Stack Overflow 播客 #36 (https://blog.stackoverflow.com/2009/01/podcast-36/),表达了这个观点:一旦您了解设置状态机是多么容易,您就再也不会尝试不恰当地使用正则表达式.

In the Stack Overflow podcast #36 (https://blog.stackoverflow.com/2009/01/podcast-36/), this opinion was expressed: Once you understand how easy it is to set up a state machine, you’ll never try to use a regular expression inappropriately ever again.

我已经做了很多搜索.我找到了一些学术论文和其他复杂的例子,但我想找到一个简单的例子来帮助我理解这个过程.我使用了很多正则表达式,我想确保我永远不会不恰当地"使用一个正则表达式.再来一次.

I've done a bunch of searching. I've found some academic papers and other complicated examples, but I'd like to find a simple example that would help me understand this process. I use a lot of regular expressions, and I'd like to make sure I never use one "inappropriately" ever again.

推荐答案

当然可以,尽管您需要更复杂的示例才能真正理解 RE 的工作原理.考虑以下 RE:

Sure, although you'll need more complicated examples to truly understand how REs work. Consider the following RE:

^[A-Za-z][A-Za-z0-9_]*$

这是一个典型的标识符(必须以 alpha 开头,后面可以有任意数量的字母数字和 undescore 字符,包括无).以下伪代码显示了如何使用有限状态机完成此操作:

which is a typical identifier (must start with alpha and can have any number of alphanumeric and undescore characters following, including none). The following pseudo-code shows how this can be done with a finite state machine:

state = FIRSTCHAR
for char in all_chars_in(string):
    if state == FIRSTCHAR:
            if char is not in the set "A-Z" or "a-z":
                error "Invalid first character"
            state = SUBSEQUENTCHARS
            next char
    if state == SUBSEQUENTCHARS:
            if char is not in the set "A-Z" or "a-z" or "0-9" or "_":
                error "Invalid subsequent character"
            state = SUBSEQUENTCHARS
            next char

现在,正如我所说,这是一个非常简单的例子.它没有展示如何进行贪婪/非贪婪匹配、回溯、一行(而不是整行)内的匹配以及状态机的其他更深奥的特性,这些特性很容易被 RE 语法处理.

Now, as I said, this is a very simple example. It doesn't show how to do greedy/nongreedy matches, backtracking, matching within a line (instead of the whole line) and other more esoteric features of state machines that are easily handled by the RE syntax.

这就是 RE 如此强大的原因.执行单行 RE 可以执行的操作所需的实际有限状态机代码通常非常长且复杂.

That's why REs are so powerful. The actual finite state machine code required to do what a one-liner RE can do is usually very long and complex.

您能做的最好的事情是获取特定简单语言的一些 lex/yacc(或等效)代码的副本,然后查看它生成的代码.它不漂亮(它不一定是因为它不应该被人类阅读,他们应该查看 lex/yacc 代码)但它可能会让你更好地了解它们是如何工作的.

The best thing you could do is grab a copy of some lex/yacc (or equivalent) code for a specific simple language and see the code it generates. It's not pretty (it doesn't have to be since it's not supposed to be read by humans, they're supposed to be looking at the lex/yacc code) but it may give you a better idea as to how they work.

这篇关于正则表达式转换为状态机的简短示例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆