为正则表达式编写解析器 [英] Writing a parser for regular expressions

查看:99
本文介绍了为正则表达式编写解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即使经过多年的编程,我还是很never愧地说我从来没有真正完全掌握过正则表达式.通常,当问题需要正则表达式时,我通常可以(在一堆引用语法之后)提出一个合适的正则表达式,但这是我发现自己使用频率越来越高的一种技术.

Even after years of programming, I'm ashamed to say that I've never really fully grasped regular expressions. In general, when a problem calls for a regex, I can usually (after a bunch of referring to syntax) come up with an appropriate one, but it's a technique that I find myself using increasingly often.

因此,为了自学并正确理解正则表达式,我决定尝试学习某些东西时应做的事情;也就是说,尝试写一些雄心勃勃的东西,一旦我觉得学到了足够的东西,我可能会放弃.

So, to teach myself and understand regular expressions properly, I've decided to do what I always do when trying to learn something; i.e., try to write something ambitious that I'll probably abandon as soon as I feel I've learnt enough.

为此,我想用Python编写一个正则表达式解析器.在这种情况下,足够了解"意味着我想实现一个解析器,该解析器可以完全理解Perl的扩展正则表达式语法.但是,它不一定是最有效的解析器,甚至不一定在现实世界中可用.它只需要正确匹配或不匹配字符串中的模式.

To this end, I want to write a regular expression parser in Python. In this case, "learn enough" means that I want to implement a parser that can understand Perl's extended regex syntax completely. However, it doesn't have to be the most efficient parser or even necessarily usable in the real-world. It merely has to correctly match or fail to match a pattern in a string.

问题是,我应该从哪里开始?除了正则表达式在某种程度上涉及有限状态自动机这一事实外,我几乎对正则表达式的解析和解释一无所知.对于如何解决这个相当艰巨的问题的任何建议,将不胜感激.

The question is, where do I start? I know almost nothing about how regexes are parsed and interpreted apart from the fact that it involves a finite state automaton in some way. Any suggestions for how to approach this rather daunting problem would be much appreciated.

编辑:我应该澄清一下,当我要在Python中执行 正则表达式解析器时,我并不会对示例或文章所用的编程语言感到过多的困惑.只要不在Brainfuck中,我可能就会对它有足够的了解,这值得我花点时间.

I should clarify that while I'm going to implement the regex parser in Python, I'm not overly fussed about what programming language the examples or articles are written in. As long as it's not in Brainfuck, I will probably understand enough of it to make it worth my while.

推荐答案

编写正则表达式引擎的实现确实是一项非常复杂的任务.

Writing an implementation of a regular expression engine is indeed a quite complex task.

但是,如果您对如何实现感兴趣,即使您对实际实现的细节不够了解,我还是建议您至少阅读这篇文章:

But if you are interested in how to do it, even if you can't understand enough of the details to actually implement it, I would recommend that you at least look at this article:

> 正则表达式匹配可以简单快速

它解释了多少种流行的编程语言以某种正则表达式可能非常慢的方式实现正则表达式,并解释了一种稍有不同的更快的方法.本文包含有关拟议实现的工作原理的一些详细信息,包括C中的一​​些源代码.如果您刚刚开始学习正则表达式,可能会有些繁琐,但是我认为值得了解两者之间的区别是很值得的方法.

It explains how many of the popular programming languages implement regular expressions in a way that can be very slow for some regular expressions, and explains a slightly different method that is faster. The article includes some details of how the proposed implementation works, including some source code in C. It may be a bit heavy reading if you are just starting to learn regular expressions, but I think it is well worth knowing about the difference between the two approaches.

这篇关于为正则表达式编写解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆