RegEx 引擎的工作原理 [英] How a RegEx engine works

查看:53
本文介绍了RegEx 引擎的工作原理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在学习正则表达式时,我想知道底层引擎是如何工作的.可能更具体地说,我想更多地了解它如何评估、确定优先级和解析表达式.我觉得 RegEx 引擎对我来说是一个黑匣子,我真的很喜欢破译它.

In learning Regular Expressions it had me wondering how the underlying engine works. Probably more specifically, I'd like to know more about how it evalutates, prioritizies and parses the expression. I feel the RegEx engine is a blackbox to me, and I would really enjoy deciphering it.

所以我想问一下是否有一些很棒的资源可以让我阅读讨论 RegEx 引擎理论.

So I'd like to ask if there are some great resources that I could read up on that discuss RegEx engine theory.

*注意:我对构建引擎不感兴趣,只是了解它的内部工作原理.

*Note: I am not interested in building an engine, just learning the inner workings of it.

推荐答案

正则表达式引擎有两大类.

There are two main classes of regex engines.

  1. 那些基于有限状态自动机.这些通常是最快的.他们通过构建一个状态机,并从输入字符串中输入字符来工作.在这样的引擎中实现一些更高级的功能是很困难的,如果不是不可能的话.

  1. Those based on Finite State Automaton. These are generally the fastest. They work by building a state machine, and feeding it characters from the input string. It is difficult, if not impossible, to implement some more advanced features in engines like this.

基于 FSA 的引擎示例:

Examples of FSA based engines:

  • Posix/GNU ERE/BRE —在大多数 unix 实用程序中使用,例如 grep、sed 和 awk.
  • Re2 —一个相对较新的项目,试图为基于自动机的方法提供更多功能.
     
  • Posix/GNU ERE/BRE — Used in most unix utilities, such as grep, sed and awk.
  • Re2 — A relatively new project for trying to give more power to the Automata based method.
     

那些基于回溯的.这些通常将模式编译成字节码,类似于机器指令.然后引擎执行代码,从指令跳转到指令.当一条指令失败时,它会回溯以寻找另一种匹配输入的方法.

Those based on back-tracking. These often compile the pattern into byte-code, resembling machine instructions. The engine then executes the code, jumping from instruction to instruction. When an instruction fails, it then back-tracks to find another way to match the input.

基于回溯的引擎示例:

  • Perl —原本的.大多数其他此类引擎都尝试在 Perl 语言中复制正则表达式的功能.
  • PCRE —最成功的实施.这个库是使用最广泛的实现.它具有丰富的功能集,其中一些不能被视为"Regular"更多.
  • PythonRubyJava, .NET —我不打算进一步描述其他实现.
  • Perl — The original. Most other engines of this type try to replicate the functionality of regexes in the Perl language.
  • PCRE — The most successful implementation. This library is the most widely used implementation. It has a rich set of features, some of which can't be considered as "Regular" any more.
  • Python, Ruby, Java, .NET — Other implementations I don't intend to describe further.

更多信息:

  • regular-expressions.info - Tutorial
  • regular-expressions.info - Flavor comparison
  • swtch.com - Implementing Regular Expressions — A good set of articles about effective, Automata based, regular expressions.

如果你想让我扩展一些东西,发表评论.

If you want me to expand on something, post a comment.

这篇关于RegEx 引擎的工作原理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆