实施词法分析器时是DFA还是Regexes? [英] DFAs vs Regexes when implementing a lexical analyzer?

查看:105
本文介绍了实施词法分析器时是DFA还是Regexes?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(我正在学习如何编写编译器,因此,如果我提出任何不正确的声明,请更正我的想法)

(I'm just learning how to write a compiler, so please correct me if I make any incorrect claims)

为什么有人仍会在代码中实现DFA( goto语句,表驱动的实现)何时可以仅使用正则表达式?据我了解,词法分析器会输入一串字符并列出一系列标记,这些标记在语言的语法定义中是终端,因此可以用正则表达式对其进行描述。

Why would anyone still implement DFAs in code (goto statements, table-driven implementations) when they can simply use regular expressions? As far as I understand, lexical analyzers take in a string of characters and churn out a list of tokens which, in the languages' grammar definition, are terminals, making it possible for them to be described by a regular expression. Wouldn't it be easier to just loop over a bunch of regexes, breaking out of the loop if it finds a match?

推荐答案

您绝对正确的是,编写正则表达式比DFA更容易。但是,要考虑的一个好问题是

You're absolutely right that it's easier to write regular expressions than DFAs. However, A good question to think about is


这些正则表达式匹配器如何工作?

How do these regex matchers work?

最快速的正则表达式匹配器实现是通过内部编译为某种类型的自动机(NFA或最低状态DFA)来实现的。如果您想构建一个使用正则表达式来描述要匹配的令牌,然后遍历所有令牌的扫描仪,则绝对可以这样做,但是在内部,它们可能会编译为DFA。

Most very fast implementations of regex matchers work by compiling down to some type of automaton (either an NFA or a minimum-state DFA) internally. If you wanted to build a scanner that worked by using regexes to describe which tokens to match and then looping through all of them, you could absolutely do so, but internally they'd probably compile to DFAs.

很少有人会真正为DFA编写代码以进行扫描或解析,因为它是如此的复杂。这就是为什么有诸如 lex flex 之类的工具的原因,它们使您可以指定要匹配的正则表达式,然后自动向下编译到幕后的DFA。这样一来,您就可以兼得两全其美-您可以使用更好的正则表达式框架描述要匹配的内容,但是可以在后台获得DFA的速度和效率。

It's extremely rare to see anyone actually code up a DFA for doing scanning or parsing because it's just so complicated. This is why there are tools like lex or flex, which let you specify the regexes to match and then automatically compile down to DFAs behind the scenes. That way, you get the best of both worlds - you describe what to match using the nicer framework for regexes, but you get the speed and efficiency of DFAs behind the scenes.

关于构建巨型DFA的另一个重要细节是,可以构建一个尝试并行匹配多个不同正则表达式的DFA。这样可以提高效率,因为可以在字符串上运行匹配的DFA,并且可以同时搜索所有可能的正则表达式匹配。

One more important detail about building a giant DFA is that it is possible to build a single DFA that tries matching multiple different regular expressions in parallel. This increases efficiency, since it's possible to run the matching DFA over the string in a way that will concurrently search for all possible regex matches.

希望这会有所帮助!

这篇关于实施词法分析器时是DFA还是Regexes?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆